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ABSTRACT 

An algorithm that may be used for the classification of periodically 
amplitude modulated (PAM) targets is presented. The data base used to 
test the algorithm is derived from radar returns from vehicles moving 
at various velocities and aspect angles, but the techniques are appli- 
cable, as well, to other active wave devices such as sonar and laser. 
The received radar signal is considered to be a time series that is a 
function of target type, range, velocity, orientation and noise. Class- 
ification is implemented in the frequency domain; short-time spectra 
are computed using the Fast Fourier Transform (FFT). Features are 


extracted from the information bearing sidebands of the resulting spectra. 


The radar signatures are classified using both linear discriminant and 
nearest neighbor classifiers, and performance is presented for two, 
three, five, and six class cases using single and sequential looks. 
Probabilities of error of less than ten percent are achieved for five 


or fewer classes 
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1 Introduction 

The purpose of this dissertation is to investiyate techniques for 
the identification of radar targets that possess periodically amplitude 
modulated Sipnatures. This introductory chapter includes a survey of 
automatic target identification with particular attention devoted to 
radar. A philosophical discussion of the general pattern recognition 
problem is presented followed by a procedure for the design of pattern 
recognition systems. The final section in the introduction is a summary 
of the ensuing chapter: 

Automatic Target Identification 

Target identification is the act of assigning a label to the out 
put of a specified sensor, or set of sensors, that has sampled an object 
of interest. Identification necessarily follows detection; the decision 
about the presence of a target must already have been made. In the fol 
lowing discussion and throughout this thesis, it is assumed that a tar 
get of interest is present and has been detected. Automatic identifica 
tion, of course, implies that a machine, rather than a human observer, 
assigns the class label to the signal. 

A hierarchy of levels of identification exist. If the universe of 
targets of interest is the set of all aircraft, the first division might 
be large aircraft vs. small aireraft. Among the class of large aircraft, 
there exist two natural classes: propeller-driven and jet. The set of 
large, jet aireraft may be further subgrouped into bomber and transport, 
The set of aircraft called B-S2 belongs to the superset jet bombers. 
Various B-S2 aircraft mav be identified by model number, e.g. BeS2B, B 
52D, B-S2i, ete. The most precise identification might, for exampte, 
assign the tail number to a particular BeSou observed. The designer of 


an automatic target tdentifieation system must consider the types of tar 
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gets to be identified, the level to which they must be classified, and 
the types of sensors to be used. 

Most target identification schemes described in the open literature 
are based on sensors that sample cither acoustic or electromagnet ic 
waves. The sensors may either be active or passive devices. An active 
device transmits a wave and receives the target's response to that wave. 
The tarpet's electromagnetic or acoustic response is the input to the 
identification system. A passive device samples electromagnetic or 
acoustic emissions from the target and uses these emissions to classify 
the tarpet. 

Acoustic devices are used primarily underwater or on the surface of 
the earth, Underwater target identification by acoustic means generally 
implies sonar, either active or passive. Surface acoustic target identi 
fication includes attempts to classify different types of vehicles by 


their acoustic emissions, e«.g@. Nichol (Ref 48) or Thomas (Ref 65). The 


identification of surface vehicles by analysis of their seismic signatures 


mty be possible but is limited to a few hundred meters range (Ref 3:5) 
due to the rapid attenuation of such signals. 

Electromagnetic tarect identification sensors may be classified by 
the portion ef the speetram in which they operate. Tn the higher fre 
quencies one finds passive ultraviolet, optical, and infrared sensors. 
The outputs of such devices are typically images that may be analyzed by 
digital picture processing (Ref 59) or Fourier opties techniques (Ref 30). 


Pau (Ref 51) has described the use of a laser in a target recognition 


application, Radar operates in the microwave region of the cleetromagnetic 


speetrum., Radar target identification is the subject of this thesis, 


Radar VTarget Wentifieation. Since its development, radar has had 
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one glaring Limitation: the inability to identify specific targets. 
Radar operators and enpineers have been striving to overcome this con- 
straint for vears (Ref 29). Skilled operators have achieved limited 
success in target identification using conventional radars under certain 
conditions. Azimath resolution has been improved by using synthetic 
aperture techniques where applicable. Range resolution has been in 
creased through pulse compression and wide bandwidth techniques. But 
even if radar imagery can be made of optical quality, it would still 

be desirable to perform target identification automatically ino many 
applications. In addition to robotic vision and perception, other 
Situations in which automatic identification would be desirable include 
those in which the operator is relatively unskilled or must divide his 
attention among many tasks, 

Even when very high resolution systems are available, it may still 
be more convenient to perform automatic identification in the signal 
domain rather than the image domain. As will be demonstrated, under 
certain conditions, features may be found in the signal domain that are 
relatively invariant to target parameters such as viewing aspect. The 
high reselution image of anv interesting object, on the other hand, will 
inevitably be aspect dependent. This parametric dependence means that a 
composite hypothesis testing procedure is required for identification, 
rather than a simple hypothesis test that mav be used for parametrically 
independent signatures, 

Numerous automatic radar target recognition techniques have been 
proposed and tested with varying degrees of success. One of the most 
fundamental approaches (Ref 41) attempts to characterize the target's 


electromagnetic response by transmitting a set of harmonically related 
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frequencies in the Rayleigh region, where the dimensions of the scatterer 
are small compared with wavelength, Researchers at Ohio State University 
have had success identifying scate model aircraft and other objects using 
this approach. The obvious drawbacks to such a technique are the joint 
requirements of numerous, low frequency transmitters. Other researchers 
have selected features from the range trace of the target (Ref 58), ne- 
cessitating a relatively high bandwidth. Numerous methods of radar 
target identification currently in use or under investigation are in 
cluded in the survey by Nahin (Ref 47). 

This thesis will present a method of classifying a specific set of 
radar targets, namcty those that are periodically amplitude modulated. 
This class of targets includes many man-made, moving objects, since they 
are usually propelled by rotating structures that frequently have a radar 
eross section (RCS) that is a function of the rotation. 

Radar Signals. An analysis of radar signals provides insight into 
what aspects of the signal may be useful for radar target identification, 
A monochromatic radar transmits a signal of the form 


$C) = a(t) cos ant (1) 


Oo 
where a(t) is a known amplitude modulating function and Wy ls the carrier 
frequency in radians per second. If the radar is of a continuous wave 
(CW) type, a(t) is a constant. If the radar is a simple pulsed type, a(t) 
iS @ positive constant for fixed intervals separated by longer fixed in 
tervals where a(t) is zero. For stationary radar, target, and clutter, 


the received signal is 


S(T)=b (tH) cos Qo tes)yen(t) (2) 


where 8 is a random variable representing phase. Electrical noise is 


assumed to be an additive random process represented by n(t). The 
| | 
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amplitude funetion b(t) is a random process that is a tunetton of a(t), 
tarpet ranpe, clutter, and target RCS. The target RCS Is, in turn, « 
function of tarpet type, radar polarization, and viewing aspect. 

If the radar remains stationary, but the target ts moving at a 
fixed radial velocity, the form of the returned stgnal, nevleeting 
elutter, is 


s(t)=b(tieos[ (ow to )trs)rn(t) (3) 


The doppler shift is given by 


a (+) 


where Vn is target radial velocity, and \ is the radar wavelength. 
\ 


Neglecting noise, the positive frequency halt of the amplitude speetram 


of a particular realization of Eq (3) may be Written as 


IS fw) =| RL) a8 Cw Govt 3, th 26 

br jhe { Bal: 
Ih - 
[BC uw oy) | , % * 0 (5) 


where * represents convolution, S(e) and Bw) are the respective 
Fourter transforms of s(t) and b(t), and 6(°) is a Dirac delta. TF b{t) 
is constant, the spectral representation will be discrete with a slngle 
spike at the frequeney of the return. Ef, on the other hand, the RCS 
is fluctuating, b(t) will be modulated resulting in a spreading of the 
spect rum. The amplitude spectra for the three types of targets dts 
cussed above are shown in Fig 1. 

If prior knowledge exists about the form of b(t), it may be used to 
aid in identifving the target beine illuminated. Tf the doppler spread 


is sufficiently great, the nature of BCw) may suggest the type of tarret. 


This is the essence of the radar target identification teehnique proposed 


in this thesis, 
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Fig 1. Typical Spectral Signatures of Radar Targets 


Pattern Recognition 

For the past half of a century, numerous researchers and designers 
have been preoceupied with the idea of building a machine that can re 
eopnise patterns as humans do. Ullman (Ref 67) describes an optteal 


character reader designed by Tausehek in 1929 that used a simple optical 


mask matching technique, Since that time, activity in the pattern 


recognition field has grown exponent tally. 

At first glance, there may appear to be little relationship be 
tween optical character recognition and radar target identitteation; 
however, both endeavors may be pursued within the mathematical frame 
work of pattern recognition, e image of the alphabetical 
character and the returned radar signal may be regarded as mathematical 
functions, the former a function of two independent spatial variables 
and the latter a function of one independent temporal variable. The 
pattern recognition process consists of applying a series of transfor 
mations to the functions of interest. The determination of the best, 
or at least near best, transformations to apply is the art of pattern 
recognition. 

Virtually any general discussion of pattern recognition contains 


y 


an obligatory diagram such as Vig (unless the author adopts a 
syntactic approach as espoused by Fu (Ref 24)). This figure, or some 


variant of it, has become the coat of arms of the pattern recognition 


researcher just as Shannon's (Ret 60) schematic diagram of a general 
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communication svstem ftlis that rete tor the communtcations engineering 
community. 

The real world, when considered as a veeter space of potential 
Measurements that could be made upon the universe, is of infinite 


dimensionality. People are continually sampling the world with sensors: 
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Vig 2. Canonical Pattern Rocognitton Systen 


cameras, microphones, accelerometers, and radars to name a few. The 
output of the sensor may be discrete but it is more often analog, as is 
the case for the output of a radar receiver. In either case the sensor 
has reduced the dimensionality of the information at hand. The output 
of the sensor, considered as a function, resides in a vector space that 
is usually referred to as the measurement space. Physically, the sensor 
is a transducer that converts the desired physical quantities into a 
more Convenient form, typically an electrical signal. Unfortunately, 
the sensor invariably injects into the desired signal some sort of noise. 
This noise may be due to any combination of measurement error, quanti- 
zatton error, thermal noise, external electromagnetic interference, 
leakage from the power supply or other source, intermodulation harmonics, 
or other signal distortion due to system nonlinearities. 

The blocks following the sensor in Fig 2 are somewhat. arbitrary, 
but they do seem to describe the three primary functions performed in 
a typical pattern recognition device. The distinctions between pre- 
processing, feature selection, and classification may not always be sharp. 
In fact, some authors do not honor preprocessing with a major block but 
relegate it to a subfunction under feature extraction. 

As the block diagram implies, preprocessing is a transformation 
from the measurement space to the pattern space. Typically, the front 
end of the preprocessor consists of an analog-to-digital (A/D) converter, 
since the sensor is usually an intrinsically analog device and the remain- 
der of the processing is done most conveniently in a digital system 
(considering contemporary technology). Another function of the preprocessor 
is the application of appropriate windows for data segmentation or pre- 


smoothing. Preprocessing may also include applying linear transforma- 


tions to prewhiten or square up (i.e. normalise all measurements by their t 


respective standard deviations) the pattern space. Frequently, one of 
the most important functions of the preprocessor is to expand the data 
in a more conventent set of basis functions, e.g. via diserete Fourier 
transformation. Finally, the preprocessor may be used to filter out 
he noise and artifacts introduced by the sensor. 

A briet digression in the form of a discussion of terminology 
appears necessary at this point. In the pattern recognition literature, 


"feature selection" and "feature extraction" are used Synonomousty to 


refer to four distinct transformations. », the act of expanding 
the data in a new set of basis functions is referred to as feature 
selection; this tr formatton tS included tn preprocessing here N6Nt. 
in the desien procs Vnumber of potential features are selected based 
on mathematical, physical, or statistical considerations: this process 
is designated design feature extraction by the present author hese 
features are evaluated and some useful subset is retained for classifi 
cation Mhis process is termed feature selection here. binally, m the 
on-line pattern recegsrition system, those feature that have been se 
lected in the design process must be extracted from the input data, This 
will be called feature extraction. Tt could certainly be 


terminology selected here is somewhat arbitrary, but at least it is 
fairly descriptive and does draw a distinction between the various 
feature selection/extraction types of processes. 

In some applications the pattern space will be of adequately Low 
dimenstonalitv, and the class representations will be sufficiently se 
parated that no further feature extraction is required. Such ts net the 
case, however, for the problem at hand and prebably seldom is for in 


teresting pattern recognition problems. Thus the requirement for a 


feature extractor is indicated. Frequentiv, the pattern space will be ri] 


a Fuclidean space of dimension between 100 and 1000. The feature extrac 
tion precess is a dimensionality reducing transformation that) should 
discard information that is common to the various classes and retain that 
information that bears the class discriminating capability. The feature 
space is typically of a dimensional order of ten. 

Finally the feature vectors are input to the classifier which 


assigns class labels. Thus, the classifier, regardless of its imple 


mentation, parses the feature space and assigns labels to the various 


regions. Viewed abstractly, the pattern recognition system is a trans 
formation from an infinite dimensional space, the space of all possible 
measurements on t 


1 wiverse, to a set, usually finite, of class labels. 


In the followine sections the various components of the pattern 


recognition system will be examined in greater detail, but the clabora 
tion will be problem-specifie. The reader who desires a more complete 


treatment of the general pattern recognition problem is referred to 


the texts by Andrews (Ref 1), Duda and Hart (Ref 18), Fukunaga (Ref 25), 


or Meisel (Ref 44). Each of the four have certain strengths and weak 
nesses. Andrews emphasizes feature extraction/selection and devotes 
considerable coverage to preprocessing transtermations., Duda and Hart 


give good, even treatment to the subject including excellent bibliographies 
and historical presentations. Vukunaga is perhaps the most abstract and 
rigorous of the four, but he sometimes becomes so immersed in the 
formalisms that he loses sight of the goal ef pattern recognition, 

Meisel is perhaps the most applications-oriented of the four. For a 

more complete eritique of the texts, Cover's review (Ref 135) may be 
consulted. The reader who is more interested in a short survey of re 

cent (1968-1974) work in pattern recognition will profit: from Kanal's 


paper (Ref So). 
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into feature selection. 


processing or implicitly incorporates it 


The overall design process is depicted in schematic form in Fig 3. 


It may be noted that this figure bears some resemblance to the one that 


shows the canonical form of a pattern recognit 
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In this case, the pattern recognition design procedure may 


well provide feedback in the form of greater insight into the underltying 


physics % 


but there are so many sources 


of uncertainty 


In other cases the basic physical phenomenon is welt understood, 


that a pattern recognition 
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Fig 3.) The Pattern Recognition Design Process 
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approach ts speeitred, 

After some understanding of the basi¢ laws of the processes is 
pained, one must select an appropriate sensor, a device that will 
getoas an intertace between the environment and the pattern reeesnit ion 
device, Frequently, the type of sensor will be piven; such is the case 
with the problem in thes thesis. Numerous questions mast be asked of 
any proposed sensor, such as does it have sufficient bandwidth and 


resolution? Must the sensor be a coherent device, t.e. must phase in 
. 
* 


formation be available? What sert of contamination of the reeeived 
Stenal will the sensor introduce? These and other questions necessitate 
the feedback Loop from bloek & back to black in the diagram. Some 


of the questions about the sensor eannot be ansiwe 


cod until the performance 
of the whote system has been evaluated, At that time, the designer may 


find that satisfactory performince simply can't be achieved with the 


Stgn may indicate that some of the sensor complexity may be etiminated 
with a minimal sacrifice in overall performance. Vor exanple, coherence 
in a radar implies two video channels rather than one. whieh in turn 
means additional hardware and signal precessing costs that mav not be 
justifiable 


Following the’ selection of a candidate senser, the designer mast 
, 
generate a representative data base. In practical terms, this means 


assembling representative samples of the objects to be identified, sense 


them under the various conditions under whieh thevw wilt most tikety 


appear, and record their signatures, The collection ef Live data can be 

the moet costly step in the desina precess) however, in Some cases an 

exeeltlent data base may already be available. The pattern reeepnitton 

practtoner will seldom be an expert at taking eleetromicnetico, setsmic, t 
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or other physical measurements and must boave that work to those that 

are competent in the particular field of measurements, lowever, tf 

the designer has a yood understanding of the processes involved, he witl | 

be in a position to provide input as te the conditions under whieh Che | 
{ 

data must be taken. Also, if he has a pood understanding of the measure } 
' 
| 


ment process, he will certainty have tnercased insight into the measured 
data, The designer's primary responsiblity in this area is to insure 
that the data collection is truty representative of the objects to be 
identified. Vor example, if three-dimensional objects that have aspect 
dependent signatures are to be recoyniced, observatirons must be taken 

at Various aspects to insure a representative data base 

Under some circumstances, the researcher may not have access to . 
actual data, or it may net be feasible to gather it due to lack of the 
required resources. \t times, such obstacles mav be cireunvented through 
physical or mathematical modeling. tn his aireratt identification 
research, Ksienski (Ref 41) has used sealeswodel aireraft with conduct tve 
coatings and appropriately frequency -scated radars. Lin and Richmond 
(Ref SO) have numericaliy computed the electromagnetic scattering signa 
tures of aireraft based of wire prid models. The pattern recogntt ion 
researcher must insure that such models accurately portray the processes 
of interest. 

Preprocessing the data for the design procedure is simitar to the 
corresponding step in the recoynition system, Assuming that the classttt 
cation will be done digitatty, the data must be converted from an analoy 
Or instrumentation format to one that is computer compatible, and then 
filtered. Also, an appropriate pattern space mast be chosen, one in 


which the physical peculiarities of the various classes are manitest, 
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: After the data has been preprocessed, it should be subjected to a 

? . 

il statistical analysis. Initially, the analysis will consist of the } 
{ 

& designer studying various plots of the data in order to correlate the | 

4 ' 

eee? | ; 

} representation of the data in the pattern space to the underlying 

: 

j physical processes. Then statistics of the process are computed, | 

i . . - . | 

| such as means and variances of potential features, as well as signal { 

4 i 

; ' 

| to-noise ratios. flistoprams, scatter plots, and cluster analysis 


routines may also be of value in this phase, and later when spectfie 


features are being chosen. One of the goals of the designer in this 


eek | Cabrini 


phase ts to estimate the statistics of the data. It may be possible to 


apply probability density funetion estimation techniques such as Parzen 
Ppp? \ 


i 

? 

{ windows (Refi 18:88-9S8), 

i Mathematical simulatYon of the measured processes forces the de 

; v3 ee ; ‘ ry : 

: sipner % quantify his observations. Yhe modeling process may bepin 

? with block 1] and mav be refined as the designer gains more insight. 
Simulation will also provide potential Features to be used and may 


Suggest the form of the classifier, For example, if ali of the Statistics 


of the data are known, a Bayes classifier is appropriate. 


eee 


At this point, the designer wilt have in mind numerous features 


that bear class-discriminating information. General, as well as problem 


4 
specific, considerations in design feature extraction and feature se 
lection will be discussed in greater detail in a subsequent chapter. The 
feedback loop from block LO indicates that the designer must have in mind 
the type of classifier to be used when extracting and evaluating features, 

] 

After the features to be used are chosen and the form of the classi 
fier has been specified, the overall performance of the pattern recognition 
system may be evaluated. ‘the traditional figure of merit for a pattern 
reeopmition system js the mean probability of error.  Toussarnt (Ret oo) { 

j I 
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has provided a definitive treatment of the estimation of the mean 
probability of error as well as an exhaustive bibliography of research 
on the subject. Shrihari (Ref Ol) has defined classifier bias and 
has shown the impor, e of this measure in the multiobservation case. 
Relative cconomy of implementation in terms of quantity of hardware re 
quired and the time taken to make a decision are also relevant: per 
formance indicators. If a system is not feasible at present because 
of the quantity of data processing required, this may not be a serious 
long term limitation in view of the dynamics of Large seale integrated 
circuit technology. Tt has been estimated that by 1985, 1t will be 
possible to build a hand-held calculator that possesses all of the 
computing power of the largest main frame computers of today (Ref 7). 
Performance evaluation will be considered in wore detail in a subse 
quent chapter. 
if the overall system performance is not satisfactory, the de 

stener must return to some earlier phase and make appropriate revisions. 
The figure shows the feedback Loop to the feature extraction Stage. 
This is the most reasonable place to begin; however, the designer may 
ultimately be forced to return to any earlier phase including a re-examt 
nation of the basic physics of the processes of interest. In point of | 
fact, although the feedback Loops are not explicitly depieted in the dia 
gram, the designer should feel free at any phase of the design process 
to return to some earlier stage to make refinements. There is, of course, 
ho guarantee that this process will actually produce the desired classi fi- 
cation performance in any arbitrary problem, 
Summary bv Chapters 

Throughout this discussion the undertying research quest ton is how 
to distinguish between different radar targets that have periodically & 


rt 
1s 


MW gt ahh Eat 


Vaart. 


j 
{ 
i 


i 
{ 
| 


amplitude modulated signatures, 

In Chapter TL the basie physical phenomena are investigated. 
The nature of the modulation is discussed, and the representation of 
modulated signatures in short-time Fourier space is developed. The 
manifestations of nonstationarities in the signatures are presented. 
The data bases, derived from returns from live radar targets, that are 
used to desion and test a tarpet classification algorithm are described. 

In Chapter PLT, the identi fication problem is formulated in terns 
of optimal classification techniques. Discussions of Bayes classifica 
tion, composite hypothesis testing, and sequential hypothesis testing 
are included. It is shown that, although the optimal techniques give 
insight into the nature of potential solut ions to the problem, computa- 
tional difficulties and lack of a couplete physical understanding 
preclude their direct: application. 

Chapter IV presents a suboptimal frequency domain classifier. Pre 
processing, feature extraction, and feature selection as performed on 
the design set data base are discussed. The signatures are identified 
using both Linear discriminant fune¢hons sand nearest neighbor classifiers. 

Chapter V sumiarizes the conclusions to be drawn from this work, 


Areas of future research that pertain to the identification of radar 


targets that possess periodically amplitude modulated signatures are in 


dicated. 
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1} Periodically Amplitude Modutated Taryets 
This chapter discusses a class of radar targets that exhibit 


periodically amplitude modulated signatures. The physies underlying 


| 
the modulation is presented, and its consequences ih the frequency domain 
are examined, Finally, the data bases consistine of the returns from 
live radar targets, that are used in subsequent classification experi 
ments, are described, 
The Physicat Phenomenon | 
| 
The high frequency radar echo from moving targets such as aircraft, 
: 
ships, or vehicles is composed of the vector sun of a group of super 
imposed echo signals from the individual parts of the target. Nontransla 
tienal motion of the target and its associated components cause tine 
varying fluctuations in the target RCS, Such fluctuations are manifest 
in the target signature in the form of a spectral Spreading. 
In classical radar detection, these fluctuations are considered to be 
target noise (Ref 19). A moving radar target that possesses an amplitude 
modulated signature is sometimes referred to as a doppler spread target, 
because of the resultant broadening of the trequency spectrum of the tar 
get. The optimum detector for a donpler spread target has previous ty 
been derived (Ref 70:557-375), but classification of different amplitudk 
modulated targets is more difficult. 
Dunn and Howard (Ref £9) have characterized these target fluctuations 
as amplitude noise and have segregated them into two types: low-frequency 
and hieh- frequency. The Low-frequency tloctuations are due to random vaw, 
pitch, and roll motions of the target. This low-frequeney variation, re 
sulting ino a broadening of the skin Tine of the target, was not used tor | 
| 
target identification tn this research. The shin line of a radar target | 
is defined as that pertion of the target's speetral signature that con e 


ceanainettemet 
tains the energy scattered by the skin of the target. Typically this | 
energy is concentrated at a single frequency or over a very narrow ranye 
| of frequencies as indicated in Fig LT of the preceeding chapter. The 
high-frequency modulation consists of both random noise and periodic 
’ 
modulation. The random component results from skin vibration and | 
} 


random motion of target components. The periodic modulation is attri 
butable to rapidly rotating parts of the target such as atreraft pro 
pellers, ship radar antennas, or vehicle running gear. The periodic 
RCS of ea heavilv-lugsed, agricultural tractor tire, measured by Frost, 
is shown in Fig 4. The tire was mounted on a turntable, and its RCS was 
measured as a funetion of angle of rotation. 

Yo pnderstand the effect that a rotating structure has on the radar 
signature of a target, it is convenient to first consider a scatterer 
that has a simple geometry. Van Bladel (Ref 68) has presented a detailed 
exposition of electromagnetic fields in the presence of rotating scatter 
ers. Chuang (Ref 11) has solved for the monostatic (i.e. source and 
observation points colocated) power spectrum of a rotating, polygonal 
evlinder in the high frequency repion. 

We may consider an infinitely long, conducting polygonal eylinder 
that is rotating at an angular velocity a about its longitudinal axis. 
The cylinder is in the far field of a radar of radtan frequency ws with 
boresight normal to the evlinder's axis. Because of the rotation of 
the cylinder, the electromagnetic scattering is a periodic function of 
time with period 4 where 


(6) 


for an equilateral, N-sided polygonal evlinder. Vor 2n/Tsso . the 


mreetitude of the backseattered Treld may be written as 


BLO) M(tyexpc io t) (7) t 


where M(t) is a complex modulating function with period T. Because M(t) 


is periodic, it can be represented as a Fourier series 


» 


Mit}== -F exp (j2nit/1) (8) 
i=-~ i 


Thus the backscattered field is of the form 


E_(t)=¥  Flexp j(w_+2ui/T)t (9) 
s } ‘ > 


=-0 j ( 
Which has a power speetrun 


S(w)=2ne | F. | 28 {(w-w_)-2ni/T) (10) 
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where §(*) represents the Kronecker delta. Thus the power spectrum of the 
backscatter from a rotating polygonal cylinder is discrete, being nonzero 
only at frequencies given by 


wen ING , 1=0,+1,+2,... (11) 


Fig 5 depicts such a spectrum computed by Chuang using the Geometric Theory 
of Diffraction (Ref 71). 
Four observations about this type of spectrum that are useful in a 
arget recognition context are: 
1. The spectrum is symmetric about the carrier frequency if the poly- 
gon is regular; however, it will not generally be symmetric for objects of : 
arbitrary cross section. 
2, The component at the carrier frequency tends to be the maximum 
stgnal, although it is not always. 
3. As {il increases, the spectrum tends to decrease monotonically in 
macnitude, 


4. If we consider the cylinder to have a translational velocity in 


addition to its radial velocity, then the amplitude spectrum drops sharply 


wher 
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where ¢ is the doppler radian frequency, If a is the manximun radius of 


the rolling evlinder and ec the speed of light, then 


R (tS) 


os « +s . . } . . 
where v, is the target radtal velocity, 
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where i ranges over all integers such that 


There are numerous target parameters that will affect the power spec 
trum of the received signal, four of which are range. elevation angle, 
azimuth angle and velocity. Range changes alter the F.. but, as long as 


the far field assumption may be maintained, the rar 


ttow my be renoved bv stmply normalizing bv the peak amplitude 1 the 


For the simple type of scatterer considered thus far, the elevation 


angle between radar and target affects only w since a, is proportional 


DF 1) 
to v. which is the vector inner product of radar boresight and target 


velocity. For more c 


lex targets, however, the FP, are a function of 
: i 
elevation. Consider, for example, that the rotating structure represents 
the partially exposed running gear of a vehicle. At certain elevations 
more energy will be scattered by the rotating body than at others. 
Both F. and py @re funetions of the azimuth angle between the radar 
1 


and target. The Le will vary with azimuth since the RCS of the scatterer 


will chanee, and w. will change since the radial velocity is a function 


pb 
of target aspect 
Finally, the whole speetrum will vary in an accordion-like fashion 
with the target velocity 7 lf the elevation an le ¢ is defined as the 
vele measured from the horizon to the radar boresight, and the azi th 
inele @ is defined as the angle measured clockwise from the target velocity) 


vector to the radar boresicht, the radial velocity may be written as: 


and the pitch is 
Aw a if 19) 


For fixed target parameters, letting r represent represent range, the 


power speetrun can be expressed as 


S(w,7,0,4,VJ=EF (r,8,46)5(w-4nv sin @ sin $-1Nv) (20) 
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where i ranges over all integers such that 
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The discussion to this point has assumed that all target parameters 
are constant. If any of the target parameters vary, as they invariably 
will for any real radar target, the received signal can no longer be 
assumed stationary, and the formal requirements for the existence of the 
power spectral density are no longer met. The short-time spectrum can, 


however, still be computed and will be a useful representation of the 


target signature if the observation interval is chosen sufficiently 


Iwo seperate data bases were available for lysis. Both con- 


sisted of the time domain signatures of vehicles driven through the 
radar beam transmitted From fixed antennas. The two sets are referred 
to as Data Bases A and B. OF the two, Data Base A was the most complete 
and appeared to be the most reliable. 

Data Base A. The radar used to collect the data for Data Base A 
was a high pulse repetition frequency (PREF), coherent system. The 


frequeney transmitted was sufficiently high te be in the optical region 
(where the scatterer's dimensions are large with respect to wavelength) 
of the targets used. <A block diagram of the radar is shown in Fig G6. 
The stable local oscillator (stalo), shown at the upper left side 
of the figure, provides the stable reference frequency used for bot} 
the transmitter and receiver. The transmitter signal is shifted upward 
in frequency by ts the intermediate frequency (IF) of the clock, in the 
transmitter mixer. This signal is filtered, amplified, and gated on 
and of € at the PRE to form the transmitted signal which is radiated 


toward the target. 


The received signal passes through the antenna to a CircuLstor and 


then to radio frequeney (rf) amplifiers and the Cirst mixer. The e 
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reference frequency to the first mixer is the stalo output, £9. The out 
( 


put of this mixer is the TF signal at f- The output signal of the mixer 


for the offset video channel is on a tow trequency carrier. After 


amplification, the offset \ ideo Signal is passed through boxcar circuits, 


the timing of which determine the range gate, Following more amplifies 
tion, the offset video signal is recorded and its spectrum is displayed 
on the spectrum analyser. The speetrum being, shifted away from cero 
doppler permits its being observed without folding. 


Yo preserve the phase information in the signal, in-phase (1) and 


quadrature (Q) video channels are required; hence, the TP signal passes 


through two additional mixers. The references for these mixers are at 


uF but are 90 degrees out of phase with one another. The outputs 


of these channets are anplified, passed through the boxcar cirenits, and 


recorded, The } and Q videe cheaanels are matched in amplitude. 


The tape recorder used was an analog Ampex ER-1S00 which has seven 


data channels plus a voice channel. With the tape speed used, 60) tuches 


per second, the recorded band is 0 te 20 Mls on the PM channels with a 


sipnal-to-noise ratio of 44 dR and a harmonic distortion of two percent 


or less. Although the PM channels were the primary data channets, 


the | and Q video were sometimes recorded on the greater bandwidth AM 


channels as well. This was done to insure that no bieh trequency infor 


mation was lost. 


Signatures were recorded of three separate target vehicles under a 


variety of conditions. The target vehicles will be referred to as Target 


1, Tareet 2, and Target 3.) Target bo had running gear that was quite 


dissimilar to that of Targets 2? and 3, while the latter two were quite 


simitar im that respeet. Approximately 200 runs cach were recorded for 
\ 


Targets J and 2, while 35 were made tor Target 5. 
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Signatures were recorded for Targets bo and 2 positioned on a rotary 
platform. The vehicles were placed on jacks so that the running gear 
could turn treely without the vehicle moving of fF of the platform. 
Antenna depression angles and tarpet slant ranges used were 4.5" at 
6m and 26.3 at 43.5 m. The rotary platform data was not used 
in the pattern recognition analysts reported here. It was used to gatn 
sone insivht into the nature of the signal modulation produced by the 
vehicles. This part of the Uata is free of one source of noise, 
specifically that caused by the vehicle traversing rough terrain. 

The data that was used in the pattern recopnttion analysis was 
taken with cach of the three targets moving through the radar beam in 
the center of a grassy field. This field data was taken with an 
antenna depression angle of 3.5° and a Slant. range of S25 m, where 
slant range is defined as the distance from the radar antenna to the 
GCeeEget. 
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Both rotary platform and field mins were taken at 22.5 degree azi 


muth angle intervals beginning with the vehicte headed toward the radar, 
The returns were monostatic. The antennas used were linearly polarised, 
and both horizontal and vertical samples were taken. ‘The horizontally 
polarized returns were used in this analysis since no vertient 
polarized returns From Target S were recorded. It should be noted, 
however, that the level of sideband modulation compored to skin bine 
amplitude appeared to be stightty hivher for vertical polariecatton. 
During the field rans, the vehicles were sometimes at relatively constant 
Velocity, and at other time they were accelerating. 

Data Base BB. The experimental set up for the second set of data 


wis similar to that for Data Base A except that the radar used was CW 


instead of pulsed. Also, the rans were onty taken at every 45° ot t 
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azimuth, and there were no platform runs. Since the radar was not 
coherent and no offset was used, there may have becn some frequency 
foldover in the spectra, but it was neglible. Observation 4 of the 
previous section indicates that the target speetrum should drop 
sharply at sero Hz, and this was confirmed by examining the spectra 
of live data. Thus any foldover of significant amplitude will be 
caused by negative doppler clutter which will be concentrated at 
very low frequencies. This low frequency foldover is of no concern 
since the clutter so thoroughly dominates the signal at these fre- 
quencies that this band is useless anyway. 

Data Base B consisted of returns from five different vehicles, 
Targets 1, 2, 4, 5, and 6. Targets } and 4 had similar running gear 
and produced comparable signal modulation, while Targets 2, 5, and 6 
were from the same general category. The Targets 1 and 2 vehicles used 
in the Data Base B measurements were not the exact same ones used for 


Data Base A, but they were of the sane respective model. 
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IT) The Optimal Classifier 

This chapter presents optimal classification techniques, including 
Bayes classification, composite hypothesis testing, and sequential 
classification. It is concluded that such techniques, although conceptu- 
ally straightforward, are not directly applicable to the problem at 
hand because of the Jack of a complete physical model and complexities 
in the data and required implementation. 
Bayes Classifier 

Automatic target identification may be considered to be an applica- 
tion of statistical hypothesis testing. In such a context, it is well 
known that the Bayes classifier is optimal in the sense that it minimizes 
the expected risk or probability of error (Ref 25:89). This optimality 
is due to the fact that the Bayes Classifier uses all of the statistical 
information from the problem as efficiently as possible. A Bayes formu- 
lation implies that the identification problem may be cast strictly in 
statistical terms and that all statistical information about the problem 
is known, Neither of these conditions may be strictly true, but as in 
any complex physical problem certain simplifying assumptions may be made 
that will allow us to proceed. For example, the class conditional 
densities are generally unknown but may be approximated by one of two 
different ways or by a combination of the two. One method is to attempt 
to deduce the relevant statistics from a careful study of the underlying 
physics of the process, The second method is to ignore the physics and 
simply use the statistics of the training set in an attempt to estimate 
the class conditional densities using some method such as Parzen windows 
(Ref 18:88-95). 


In its most abstract setting any target identification scheme is a 


rule for assigning any observed target Signature to a class: i 


A:B > (22) 
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where A is the identification rule, Bois the observation space, and 


\ 
m 


Q={W),02,...,0 is the finite set of target classes. (it is possible 
that one of the w, may represent a rejection class, i.e. no decision 

: \ 
is made.) Thus, from a geometric point of view, the decision rule 


partitions the observation space and assigns class labels to the various 


regions: 


where BAB A, Viti (24) 
mt 
and Bel) B (25) 
i=] i 
In the general Bayesian formulation, a cost is assigned to cach type 
of decision: 


C5; cost of deciding oF when w. is true (20) 
i 


If a symmetric cost assignment is made, i.c. 


] if 9 
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then the Bayes decision rule 1s to select w; for which 
Plo, IX) Po, (x) 5 Wisi (28) 
where X represents an observation. In this discussion P(+) is used to 


designate a probability and p(s) represents a probabitity density fune- 
tion, In words, the rule simply says to take an observation and assign 
it to the class for which the posterior probabilitv is the greatest; 
therefore, this special case is frequently called the maximum a poste- 
riori (MAP) classifier. By applying Baves' theorem, the posterior 
probabilities may be calculated from the prior probabilities P(w;) and 


the class conditional densities MOSLORE 


: (29) 
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p(X)=z P(o dp (X}o,) : (30) 
i=] 


The decision rule then becomes to choose W; such that 


Popo) * P(o p(X fo.) av iAT (31) 


For this multihypothesis problem, when the prior probabilities are equal, 
) 


it is frequently convenient to use the likelihood ratio test: 


choose w such that 


p(Xfo,) 
&(X) - Ll, Vivi (32) 


pXjo) 
lf in addition to equal prior probabilities, the class conditional 
densities are of an exponential form, then the lop likelihood ratio test 
is normally used: choose w, such that 


In e(X) = In pcr} s)-In p(xfu.) O, Vidi (33) 


For general cost functions, the expected value of the cost or 


Bayes risk may be written (Ref 69:47) as 
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Tt is well known that the Bayes risk for this peneral case may be mini 


mized by selecting the Ke such that 
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sec, for example, Fukunaga (Ref 25:74-75) or Van Trees (Ret 69°46 SZ) 
If the VAM targets were always to be identified with tired, given 
parancters, and it could be assumed that the only source of uncertainty 


in the sienatures was additive white Gaussian noise, the form of the 


EI TS TT NNN ys So te 


Rayes classifier would be easy to specify. Under these conditions, it is 
well known (Ref 25:90-91) that the Baves classifier may be impfemented 
using matched filters. Such a classifier consists of a bank of matched 
filters, one for each type of target, followed by a comparator network 
to choose the Largest output. The target would be assivned to the class 
whose matehed filter pave the largest output. 

If sources of uncertainty that are present in the actual data are 
added, this simple matched filter realization is no longer optimal. As 
a real target traverses the terrain, irreqularittes in the surface cause 


changes in the tarpet parameters resulting tn modulation in both ampli 


tude and position of the sideband spike Pure} ore, if the classifi 
cation is done divitally, distortions due to discretization ant finite 
obsery ion periods ecew Such dteitally induced distortion will be 


discussed in greater detail in the following chapter. Finally, it is 


necessary to be able to perform the identification process with generally 


As was noted in Chapter IT, the signature of a PAM target depends 
not only upon the type of target but also upon the target parameters 


range, Velocity, and aspect angles. These target parameters ave trans 


p . ' . . . yee . 
formed by the electromucnetic s 


Vhe 


signal parameters, conditioned on the type of target being tlluminated, 
| specifv the magnitudes and locations of the discrete Fourter components 
| . . . * 

of the spectral signature of a simple PAM tareet. For this simplified 

| : } ‘ 

| 5 , ; : 
example, the received signal ter each class would be known except tor 

a finite set of parameters and noise. This is the composite hypothesis 
testing formubation (Ret 69:86 90). Tn the Cottowing discussion the set 


of unknown sienal parameters are represented bv the vector | 


If all of the probability densities can be specified, the problem may be 
reduced to a Simple hypothesis testing one by integrating over the para 
meter Sprece 


p(Xa;) f p(X]u,@, Gio, Id mv (So) 


where \ represents the parameter space. Then Equation (31) may be used 
to make an optinal decision. When it is desired to estimate yp, this 
problem is referred to as simultaneous detection and estimation in the 
communications theory literature. Entire dissertations have been devoted to 
ution of this problem, e.g. Gobien (Ref 28). 

Under certain circumstances, it may be assumed that the radar pro 
cessor Waintains a track file on the target, and good estimates of all 
the parameters are available. The decision rule for the multiclass 


etric cost function then becomes to choose the wo, for 


problem with a sy 


which 


Po pAlo uw) Plo dp(X}o, a JV it; (37) 


The additional knowledge about the random parameters is simply incorpo- 
rated in the class conditional densities. 
Robinson and DeNuzzso (Ref 57) argue versuasively that it may not be 
possible to evaluate class conditional densities as in Eqs (36) or (37). : 
They suggest instead that the prior probabilities be modified. The 
prior probability would be multiplied by a fuzzy set membership function 
(Ref 71) for each known parameter. For example, if the target velocity 
is known to be in the middle of the operating range for elass 1 and 
toward the high end for class 2, the fuscy set membership funetion for 
thet velocity might be unity for class J] and 0.5 for class 2, 
Another practical alternative would be to nermatize the signature 


With respect to certain parameters. For example, tarpet range mav not 


provide anv useful discriminating information. As Tong as the target 


is in the far field, range only affects the signal amplitude and, con 

Sequently the signal to noise ratio. Then the signal could be energy 

normalized without losing anv information, assuming the Signal to noise 

ratio is acceptable, and target range would be removed as a paraneter, 
‘ 

Another practical approach would be to increase the number of 
classes. <A signature from Target 1 at zero degree azimuth may not 
resemble a signature from Target 1} at 155 degrees azimuth. Thus Target 
1} mav be decomposed into several subclasses depending on the known 
azimuth angle. The most useful approach is toe consider the eftect of 
each parameter separately and apply whichever technique ts most appro 
priate. 

If the problem were to identify long, rolling polygons as deseribed 
in Chapter IT, a theoretically optimal classifier could be designed. 
Unfortunately the relling polygonal seatterer only qualitatively sim 
lates the running gear of an actual vehicle. Several sources of 
tuneertainty arise in the actual data that do not exist in the simplified 
model. At certain aspects the rotating structures will cause specular 
flashes. Rapid chaneses in pitch, vaw, roll, and Linear velocity due to 
interactions between the vehicle, terrain, and the human operator cause 
complicated modulations of the signal. Also, random modulation due to 
vibrations is unaccounted for in the PAM model. Since the state of the 
art of numerical methods as applied to electromagnetic seattering problens 
is not sufficiently advanced to account for such complexities, no complete, 
quantitative model witt be forthcoming in the near future. 

The only practicable method of estimating the class conditional 
probability densities that are required for optimal classification would 


be through an extensive measurement program. The available data bases 


are not extensive enouch to provide good estimates, and a suffietently e 


comprehensive measurement program could be exorbitantly expensive, 


i 
Ne : ’ } 
Because of these practical difficulties, the eppreach taken in the sub 
sequent chapter is to extract features that are relatively tnvariant | 
with respect to tarpet parameters. } 
Sequential Classification 

When attempting to identify a moving ralar target, if Is possible 
to take multiple observations to increase the reliability of the 
estimate of the target class. The Wald sequential probability ratio 


test (Ref 25:77-84) mav be used to reduce the probability of error to 


zero if enough observations are available. the Wald test has onty been 


assumes all of the underlying statistics are known. 
The sequential observation classification technique adopted in the 


sequel, $s a simple plurality-vote scheme. Since no eftort was made to 


estimate the reliability of the decision made after a number of obser 
Vations were taken, the technique might be more aptiy termed a 
muitiple observation classification scheme. The number of observations 
assiened to each class is remembered, and t 
the target to the class with the largest mumber of votes. Shrihari 
(Ref OL:151-179) has considered the theoretical properties of such a 
voter in the context of a radar identification of aircraft problem. 


It must be assumed that the radar is maintaining track on a given target, 


in order to make the voter decision valid. 


~t Sa 


WA Suboptimal Frequency Domain Classifier j 

= 7 | 

This chapter presents the design of a suboptimal classification | 

' 

scheme. The classifier is suboptimal in the sense that the error rates 


achieved may be greater than the irreducible error rate. The short-time 
spectra of the target signatures are computed via the FFT. Features 


are then extracted that are relatively invariant to target parameters, 


Yareet identification is performed using linear discriminant analysis 


ind nearest neighbor classification on the extracted features, 


In the pattern recognition problem, preproces 
rm 


enjovs an eminent position. Since the preprocessing is the first opera 


tion performed on the sensed data, and since it: includes potent lally 


noninvertible transformations (specifically ones that are not mono 
morphic) any losses of information or distortions wilt be propayated 
through the entire classification process. Preprocessing may be formally j 
defined as the transformation from the measurement space to the pattern 
space. 
The input to the preprocessor is typically a noisy analog sienal, 


the output of the sensor, The totality of all posstblo output signals 
from the sensor, the measurement space, mav be characterized as a finite 
power space or a finite energy space, as appropriate. Since the sub 
sequent operations are usually done digitally, the first preprocessing 
transformations usually consist of prefiltering and sampling the time 


domain data. The prefiltering is performed, of course, to prevent 


aliasing and te redtce out of band noise, and the Nyquist eriterion 
must be observed when seleeting the sampling rate. 


The data used in this experiment were alt recorded on analog tape 


using frequency modulation with a 20 Mls bandwidth. To deternine an 
av 
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appropriate sampling rate, selected data runs were digitized at 40 Miz 
and transformed using an FFT. The resulting frequency domain signals 


were plotted and examined, By visual inspection, it was determined 


that one Rie of bandwidth would be sufficient to capture the information 
of interest. 
The data was put through an analog, two pole filter which is 
yout 3 dB at one Kz. The antialiasing-filter frequency response 
is depicted in Fig 7. The prefiltered signal was sampled at two Nilz. 
On data base A, where both the in-phase and the quadrature signals 
led, both ec} 


were recordec mnels were sampled at two Nilz resulting in an 


unfolded spectrum two Ni 


to minimize computational requirements, a phase locked loop could be 


used to track the skin line (domtnant pole) of the signal. The 
Sampling rate could be set to four times the shin line frequency 
If th roature ive to be extracted from the time domain signal, 
no further preprocess} iy be required. It is frequently true, however, 
th t the MS pled { t \ de YW i: hot VY ay) wWopr Le YP ittern ’ LN © 
Then some further preprocessing transformation is required As Pavlidis 


(Ref $3:2-3} has observed, if the information pertinent to the class 
ification task tends to be time-limited, for example isolated time 
domain pulses such as electrocardiograph data, then the time domain 

is an appropriate pattern space. Uf, on the other hand, the essential 


information tends to be spread out in the time domain, it will be band 


limited in the frequency domain, ‘hus the discrete Fourier transform 
(DFT) may be a candidate for the final preprocessing transformation, 


Other orthogonal transforms that have found use in the pattern 


' . Yo +} . ley t rye +} } 
recognition context are thie Karhunen-Locve, the Watsh/iadamard, and the 
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Haar transforms. Andrews (Ret dr2lee els) provides a pood comparison 
of the various transforms.  Kabrisky and Cark have used a family of 
linear transforms tin btotosieatly moattvated optical recognition 
schemes (Ref 37). 

For the present application, the space of short-time Fourier 
spectra was chosen 2s the pattern space for several reasons. First, 
and most importanttiy, the FPourter speetrum of a moving, periodically 
amplitude modulated target is interpretable in terms of the phystes 
of the target as was discussed in Chapter 11. Second, with the 
advent of the PVT CREE P2), the itscrete Fourie 3 peetrum can he 
conputed effreiently with N log .N computer operations where N is the 
omer of the FFE. Finally, it can be shown that the (feurter trans: 


form cever neariy as rapidly as does the havrhunenlLoove 


transform (Ref 431) which is an optimum representation tn a minimen 
Wen squared ervrer sense. 

Tf the usual comumication-theory type assumption of stationary, 
white, Gaussian random processes coutd be made, conventional techie 
ques (Refs 4 and $:532-571) Cor power spectral density estimation 
could be applied. Then a simple mitehed filter type of classifier 
eould be designed in the frequency domain.  Sinee the radar returns 
from moving, amptitede modulated Carpets are certaindy not stationary, 
such techniques are not dircetly applicable. tn Che previous chapter, 
a Bayesian classifier was discussed, tnder Che assumption thet it 
certain parameters were Fixed, the time series was stationary, Vor 
now, however, the point of view assumed is that cach conti puous short 
time speetrun ts simply an independent observation of the tarpet. 


The duration of the time record to be Cransformed vita the FRET ts at 


Eritical importance, sinee its rectprecal ts the lower bound on the 
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Targets 1 and 2, while SS were mide for Target 3. 
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achievable frequency resolution. Thus, an inerease in the Length of the 
time record considered results in finer frequency resolution; however, 
the lonser the time record is, the more apt are the nonstationarities 
in the data to cause smearing in the short-time spectrum, Another 
obvious drawback to processing tong time records is the inerease in 
memory And computational power required, Since a resolution of about 
two tle was deemed desirable, av time record of one half second was 
initially used. 
An acceleration in the target's radial velocity during the inte- 

gration time of the PFT results in a distortion of the spectral lines 
that characterize the target. Nichol (Ret 48) has described thi: 
effect in the avnalysis ef spectrograns of the acoustic emissions 
of rotating machinery. As described in Chapter t1, the time dowain 
Signature of an anplitude modulated, moving target tends to be of 
the Corn: 

h 

s(t)=% F (t)expjw. (t)ten(t) (38) 

i=] ji 
where the frequencies involved are explicit funetions of time te account 
for the effect of the target accelerations, Adopting the termineroy 
commonly used in speech analysis, cach "smeared" sinusoid in Bq (S38) 
represents a formant, To simplify the following discussion, consider 
the normalized, noiseless, instantancously monochromatic signal: 

8,00) eXp Jo, (t)t (39) 
Kq ($9), representing a single formant, results from discarding the 
noise and other frequency components in Pq (S8) and normatecciys the 
sinusoidal amplitude to unity, €acitty assuming Chat this anplitade 
remains constant on the interval [O,V], the PRP intepration period. Pou 
suffictentiv small V, target acceleration mav be con ridered te be Linear : 


wr 


i 


and thus 


w(t) a i yt (40) 


where , Tepresents the initial formant frequency and yl represents the 
total shite tn Frequency during the aequisitron period, 
The continuous, short-time spectrum of Eq (39) at t rea Ls 
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Ry completing the square in the exponent and changing the variable of 


integration, Eq (41) becomes 
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The integral in Eq (42) 1s the well Known Fresnel integral that must be 
evaluated numerically or graphically using the Cornu sptval. Gerseh and 
Kennedy (Ref 27) have evaluated this integral for various Crequeney shifts 
to deseribe the spectram of sliding tones. 

Fig 8 illustrates the distortion that results to the formants for 
six different shifts. This result, for beth the continuous and diserete 
Fourter transforms of harmonically related formants, is due to Nichet 
(Ref 48). In this figure, the magnitude of the shift is propertional to 


the initial frequency of the formant and inercases with the order of the 


harmonic. Te can be seen that the heieht of the formant decreases with 
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Fig 8S. Power Spectra of Nonstationsry Harmonie Famity (Ref 48) 
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increasing initial frequeney, while the width increases, hence con 
serving energy. Columns B and C illustrate the result for the DFT, where 
the output is a funetion of the relationship between We and TT. Column B 
represents the result: when 

Oh Sa (44) 


for nan integer, Column C illustrates the output when 


yey Qn 
W,, (n+ L/2 ; 


(45) 


The frequeney spreading of the formants described above is frequently 
observed in the spectral plots of the experimental data. Fip 9 depicts 
the short-time spectrum of an accelerating target. Notice that the skin 
line at about 200 Iz exhibits the dual-peaking phenomena shown in Fig &. 
Also, the two lower energy formants at about 375 and 400 Hz show the 
same type of trequency splitting. 

The above discussion should illustrate the type of problem that is 
encountered in usine short-time speetra of nonstationary time series for 
target identification. Tt becomes apparent that one should use as short 
a time record as possible to minimize distortions due to nonstationarities. 
If one were to consider the selection of the ideal time window length as 
an optimization problem, the cost funetion to be minimized could be written 
as 

J(1T) 169 8 asa 8B se 8 at (46) 
where the term G(T) is a positive, increasing funetion of T that repre- 
sents costs due to nonstationary distortions. The function HCN) is also 
positive and increasing and typifies the computational costs involved. 
The final term I(T) is a positive decreasing function of T representing 
the frequency resolution costs. Because of the empirical nature and 


interdependence of the various parameters of Eq (46), the explicit forms 
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Fig 9. Speetrum of an Accelerating Target 


of the funetions involved would be arbitrary and relating them to 
performance could be controversial, although the optimization theoreti 
cally could casily be performed once the functions were formally 
defined. Therefore the selection of T requires a certain amount of 
engineering judgement and a “cut and try" attitude. 

Windowine. Tt is well known that the Fourier transform of the pro 
duct of two functions is the convolution of the two individual Fourier 
transforms: 


F[s(m)win) | F[ls(n)] * Fw dn) ] (47) 


where the sisnals, convelution, and Fourier transform may be either 


continuous or discrete. This theorem has an important application in the 
computation of short-time spectra of amplitude modulated radar targets. 


Since only a finite record of the signal of interest will be transform ek 
the signal s(n) is in effect beine multiplied by a weighting function 


wi) which is nonsero en a finite interval. 


If one sinply truncates the time series at the end of the time 


record, by default, a rectangular weighting function has been applied: 
wn) l O-nsN~ (418) 
0 n<O ,n5N-] 


where no is the index, and the time record consists of N points. The 
rectangular weighting funetion has the advantage of cheapness of imple 
mentation, and it has the narrowest attainable main Lobe as illustrated 
in Fig TL. The achievable resolution of the FET is determined by the 
width of the main lobe. On the debit side, the rectangular window has 
relatively hich sidelobes, the first sidelobe being only 15 dB below the 
mainlobe in the power spectrum. As is apparent from Eq (47) and Fig 11, 
the convolution of the window spectral sidelobes with any spectral peaks 
from the signat wilh result in rinegine or Gibbs phenomena. 


This speetradl distortion mav be reduced by introduce tite a window 


function that is smoothly tapered to sero. The tapering causes an increase 
in the width of the speetral main lobe and a resulting Toss in resolu 
tion, but, at the same time, it increases frequency selectivity, i.e. 
| the ability to resolve simultaneously signals of different amplitudes 
' which are separated in frequency. Some commonly used windows are given 
f 
below (Ref 4) and are shown in Fig 10, with the power spectra of each 


shown in Fig Tf: 


2 N 
Bartlett: ci » (Osn 
(Triangular) win) i x 1 
ee : < N- 40 
a eye 5 n= Wel (49a) 
‘ ] : : \ 
Hanning: w(n)=—-[1l-cos(2mn)], OsnsNn-] (49h) 
ee N-] 
Hamaing: wn)=0.54-0.4o0cos(2in), OsnsN-] (196) 
N-] 
Blackman: 
w(n}=0.42-0.5cos (21n) +0. 08cos (An), Osns<N-1 (4od) 
N-] N-1 


The Katser window (Ref 38) as defined by 


1 : 
w(n) = ofasin--(n- 


t tan] 


Oc nsN-] (50) 


has been shown to be optimum in the sense of maximm main lobe enecrev 
for a given peak side Tobe amplitude. In Eq (80), Pe oe is the modified 
zero order Bessel function of the first Kind and m equals (N=1)/2. The 
parancter a may be adjusted to trade off resolution for selectivity. 
The oby ious disadvantage of the Kaiser window is its computational comp 
lexity. For an exhaustive treatment of the various windows that have 
been used, the reader is referred to the recent paper by Harris (Ref 32). 


Selection of an opti window for a civen application would entail 


minimizing a cost functional of the form 


Vw) G wy ebt(w) et dw) (51) 
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where G(W) is the resolution cost, MQW) is the selectivity cost, and 1(w) 
is the compatational cost. Rather than attempt to quantitatively define 
these costs and solve the optimization problem, an empirical approach 
was taken. Since the baseband representation of an amplitude modulated 


target Was hnow a priori to be of the form 
S(t) = @ ac exp (jot) (52) 
‘ i 
it was possible to multiply this signal by various windows and numeri 


cally co pete the resulting power SPECTYA, The idealized log power 
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Fig 13 shows the results of plotting 
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for each of the commonly used windows. The constant 10 was added to 
each term of the time series to scale the plot, thus 120d8 corresponds 
ie . } . yyy . hey . “foane ~ Yay ‘ ry] 
to zero. It is readily apparent that the rectangular, Bartlett, and 


Hamming windows do not allow the spectrum to go to zero as rapidly as de- 
sired. Furthermore, the Bartlett window exhibits high quefrency ringing. 
(Queqreney is a measure of time associated with the Fourier transform of 


the log power spectrum of a Signal). Since the ringing due to tl 


he side 


lobes of the other windows is at too high a quefrency for the sampling 


rate, the Gibbs phenomena is not apparent in the other plots. The Hanning 
window was selected since its performance is satistactory, and it is Jess 


than half as expensive to compute as the Blackman window. Since this 


simple window performed so well, no experiments were attempted with the 


more complex Kaiser window family. i 
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Preprocessing the Data Base. As previously described, the data base 


used in this experiment consisted of the radar returns from various types 
of vehicles being driven through the beams of two types of radars at 
different azimuth angles. Both the in-phase and quadrature components 
of the received signal were recorded on analog tape for the first data 
base and only one component for the other. 

Initially, the signatures were sampled at twice the bandwidth of 
the tape and resulting spectra were analyzed. It soon became apparent, 
consistent with observation 4 about PAM iarsets, that the doppler spread 


was essentially confined to the band between 0 and 2w Accordingly, 


D° 
the data was sampled at a rate compatible with the highest target radial 
velocity. 

Because of the physical considerations previously detailed, the 
pattern space chosen was the space of short-time discrete Fourier 
amplitude spectra. An FFT integration period (corresponding to a 1024 
point FFT) was selected that was short enough to approximate quasi- 
stationarity of the time series and long enough to ensure sufficient 
spectral resolution. As discussed in the previous section, several 
time windows were considered for spectral smoothing, and it was con- 
cluded that the Hanning window provided adequate performance at accept- 
able computational cost. 

A FORTRAN computer program called WRENCII was written to perform the 
required preprocessing on the Wright-Patterson AFB CDC CYBER~74 computer. 
A flow chart of the program is shown in Fig 14. First the program reads 
a time record from the input, digital, time domain tape. If the input 
tape has been completely processed, the program stops. Otherwise, the 
time data is multiplied by a Hanning window and FFTed. The FFT is, of 


course, an efficient implementation of the DFT: 
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The particular FFT algorithm used here is FFT2 from the International 
Mathematical and Statistical Library (Ref 35). FFT? utilizes a modifi- 
cation of the Singleton (Ref 62) version of the Cooleyv-Tukey FFT al- 
gorithm. It requires the standard N lop. N basic sets of operations. 
After the complex spectrum has been computed, only the amplitude spectrum 
is retained. 
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At this point the skin line of the signature is found. A very 
Simple approach was taken; the peak signal other than ground clutter was 
called the skin line. This procedure found the actual skin line in over 
ninety nine percent of the sample spectra; however, for a few samples 
of Target 4, at zero degree azimuth, one of the sideband spikes was of 
greater magnitude than the skin line. Such a sample spectrum is de 
picted in Fig 15, when the skin Tine is at 350 Htz and the peak signal 
is at 500 Hz. Since this type of phenomenon was so rare, and because 
the feature extraction procedure adopted did not use the fine structure 
of the individual spectra, it was concluded that no refinement of the 
skin line extraction procedure was required. 

Subsequent to the skin line identification, three thresholding 
Operations were applied to the skin line magnitude and frequency. First 
the skin line magnitude was compared against an empirically determined 
threshold. If the magnitude was less than the threshold, the target was 
considered not to be in the range or asimuth gate of the radar, and that 
sample was rejected. Tf the sample spectrum passed the first threshold 
test, the skin Line doppler was checked for positivencss. If it was 


negative, the spectrum was reversed 


Sp Chi oS(Nel-k) y IW © Ue pee (57) 
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to insure that the skin line doppler was positive for all spectra. After 
this operation, the negative half of the amplitude spectrum was discarded. 
Then the skin line frequency was compared to a frequency threshold that 
corresponded to a radial velocity of about two miles per hour. If the 
skin line frequency was below that threshold, there was not enough doppler 
spread to extract useful features, and the sample was rejected. If the 
sample spectrum passed this final test, the positive half of the ampli- 
tude spectrum was written on the output tape. 

In summary, the actual preprocessing consisted of a series of trans- 
formations and tests applied to the data. First, the data was prefiltered 
and digitized. Then it was windowed and transformed via FFT, and only the 
amplitude spectrum was saved. The skin line was extracted and threshold 
in frequency and magnitude. The resulting output was the positive half, 
amplitude spectrum of the selected samples. 

The resulting baseband amplitude spectrogram for Target 1 at zero de- 
gree azimuth is shown in Fig 16. The series of high amplitude spikes at 
about 300 Hz is the skin line of the vehicle. The lower amplitude, 
periodic spikes are due to running gear modulation. There is no signifi- 
cant signal beyond the second harmonic of the skin line. It is apparent 
that the amplitude of any particular spike varies with time in a random 
fashion. There is some time correlation, however, and a certain degree of 
frequency correlation exists among neighboring spikes. Both the time and 
frequency correlation are due to the scattering geometry and how fast it is 
changing. The geometry, in turn, depends on the terrain and the steering 
of the vehicle. 

Fig 17 depicts the spectrogram of the same vehicle at 135 degrees 
azimuth aspect. As seems to be true in general, the time coherrence of 


the sideband spikes for this target is not as great as azimuths other than 
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zero or 180 deprecs. This is probably due to the fact that at those two 
ameles the running pear rotation produces speculer flashes. During the 
period of this spectrogram, the vehicle is accelerating into the range 
gate and then decelerating out of it. 

The spectrogram signature of Target 2 is shown in Fig 18,with the 
vehicle traveling through the ranee gate at 315 deprees aspect. The 
tendency is for this ivpe of target to have a lower level of side band 
modulation than that of Target 1. These three dimensional spectrograms 
with hidden lines were made with the Display Integrated Software System 
and Plotting Language (DISSPLA) (Ret 34). 

Feature Extraction 

Feature Extraction mav be formally defined as a transformation fron 

the pattern space to the featttrre space 

A262 (S58) 
where A is the feature extraction transformation, G is the pattern space, 
and Ho is the feature space. Typically G and HW are both Euclidean spaces 
with G being of higher dimensionality than Ht. In any case, [ff is always 
a finite dimensional space with each coordinate representing a different 
feature, 

The feature extraction transformation should be considered ta be a 
filter that removes irrelevant information and retains information that 1s 
pertinent to the classification problem. In an experimental context, 
where the engineer is given a data base upon which to desien a classifier, 
feature extraction implies that artifacts in the data must be ignored. 
Several such artifacts were observed in the data used in this research, 

As an example of one such artifact consider a radar, with an automatic 


gain control (AGE), having leakage from the power supply. Te the pattern 


space is the short-time Fourier speetrum of the radar sienal, the leakare 
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will be manifest as a spike at the frequency of the power supply. The 
AGC will cause the amplitude of the power supply spike ‘to be inversely re~- 
lated to the target strength, Thus the amplitude of the leakage Fourier 
coefficient might be a useful indication of input signal strength, but 

it would not be reliable as a feature for distinguishing between diffe- 
rent types of targets, since signal strength varies with many more 
parameters than just target type. 

Similarly, if the radar does not have an AGC, strong signals may 
cause nonlinear distortion in the mixers, resulting in targets with 
strong harmonics of the skin line. If, as is the case with the tareets 
considered in this research, certain targets tend to possess strong 
second harmonics, and the data is contaminated by intermodulation har- 
monies, this second harmonic may be a useful *feature, but perhaps, it 
must be weighted rather lightly. 

: ? : : : S12 

Because of the high dimensionality of the pattern space (ho ~ 
initially), and the parametric nonstationarity of the signature, class 
ification in the space of amplitude spectra is not feasible. There- 
fore, it is desirable to extract features which are somewhat invariant 
With target parameters. 

Sone features that could be most useful for discrimination between 
specific types of vehicles are suggested by the fine structure of the 
short-time spectra. For example, if the spacing between the sideband 
spikes could be determined, the period of the amplitude modulating func - 
tion could be extracted. ‘Then, if the vehicle true velocity could be 
accurately estimated, the ratio of amplitude modulation period to vehicle 
Velocity could be a retiable feature. Unfortunately, as indicated in 


the spectrograms, the sideband spikes are not distinct at other than cero 


or 180 degrees azimuth. 
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An attempt was made to highlight the formants by integrating over 
several spectral observations. The procedure consisted of considering 
Several time-consecutive spectral signatures at the same azimuth angle. 
Each observation was radial velocity normalized, since the skin line 
frequency may change considerably from one integration period to the next. 
Then, the normalized signatures were simply summed, with the hope that 
the coherent signal would be emphasized, while any noise, being inco- 
herent, would not. 

This procedure worked failrly well at zero and 180 degrees azimuth, 
as shown by Fig 19, where the formants were already rather apparent. 

In Fig 19, it may be noted that the lower side band spikes, between zero 
and 500 Hz, are quite distinct. Unfortunately, the technique did not 
emphasize the formants at other azimuths, as in apparent from Fig 20. 
This is probably due to the lack of specular flashes from the rotating 
running gear at headings other than zero or 180 degrees. Because of 
these poor results, this scheme was abandoned; however, it could possibly 
be made workable by a more refined normalization and interpolation 

scheme than was used here. A better physical model that more closely 
simulates the scattering characteristics at other than cardinal headings 
would provide greater insight and might indicate what modifications might 
make an integration scheme practicable. 

The cepstral signature of the targets seemed an obvious choice for 
extracting the period of the amplitude modulation, since a series of 
evenly spaced spectral spikes will integrate into a single large spike 
in the cepstrum at the period of the spacing. ‘The cepstrum, first 
described by Bogert and Tukey (Ref 5), is formally defined as the power 


spectrum of the log power spectrum. 


1 
C(t) = [F{Lop[S(w)|“}]” (59) P 
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A recent 


tutorial paper on the subject was published by Childers H 
(Ref 10), or for an introduction to homomorphic signa] processing, a 
\ ’ ! : t . 


generalization of cepstral techniques, the reader may wish to consult 


Oppenheim and Schafer (Ref 51). 

The cepstra for a number of observations were computed and the 
corresponding plots were examined. The results of this experiment 
were not untike those of the previous one. The amplitude modulation 


period could easily be extracted for Target 1 at zero and 180 degrees 


azimuth angles but could not generally be determined at other azimuths. 


In Fig 21, the modulation period is represented by the distinet line 


at about ©.035 seconds, while in Fig 22 no distinet modulation spike is 


in evidence. The failure of the cepstrum to extract the modulation 
period at the other azimuths is again due to the tack of distinet 
formants. Because of these results, the cepstrum was abandoned us a 
potential source of reliable, aspect invariant features. 

Since the attempts to use the fine structure of the individual 
signatures met with so little success, it was decided to extract features 
that represented gross characteristics of the signatures. In some 
ways, this rejection of the fine structure in favor of more gross 
characteristics ts reminiscent of other researchers’ attempts to find 


the Gestalt of processes to be recognized. For example, Radoy (Ref 55) 


s of alphanumeric 


used the low spatial frequeney components of the ima; 
characters, with some success, as features in an optical character re- 
coenition scheme. Also, researchers at Ohio State University (Ref 41) 
in their aireraftt identification studies have enjoyed ereater success 
utilizing low radar frequencies rather than higher ones. In both cases, 


the low frequency (gress) information is being retained, and the high 
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frequency (fine detail) informition is beine discarded. tn the aptical 
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character recognition research, tt was found that retention of the high 


ai 


frequency components tended to confuse the issue. ‘Too much information 


1 

} was available, and the high frequency tnformation only revealed the fine 

detail of the structure, while the low frequency conponents contained the 

i ae 

gross form information. The atreratt identification problem is analgous. 

q The low frequene. components are relatively aspeet invariant and contain 
information about the gross shape of the aircraft while the high fre 
quency information reveals the finer detail such as the tail structure, 
engine nacelles, etc. 

j The First step in the adaptive feature extraction algorithm used 


y 


here is to parse the input spectrum into six bands as depicted in Fig 24. 
Since the radar antenna is stationary during the data acquisition period, 


the ground clutter is concentrated near the carrier frequency. Band B 


—-s" 


includes all of the very low frequency components. ‘The information in 
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this region is discarded, since the clutter dominates the signal here. 


Next, the skin line is found, and a narrow frequency rection, B,, is de 
7 ; » ] 
fined centered at the skin line trequency. Band B., of the same width 
\ 4 
as, B,, is centered around the second harmonic of the skin 


line. The lower side band, B., conststs of all 


y 


between Bo and BL. ‘the upper side band, B,, is defined as all frequencies 
vi 4 


| between 6, and BL. Then the noise band, BL, consisting of all frequencies 
Qa a (a) ~ * 

| higher than those in Bo, is ipnored, 

5 
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Two types of features were extracted from the data, the first kind 
| being termed a speetral ratio feature and the second a shape factor, The 


features was suprested by the norms on vartous 


| form of the speetra 
Banach spaces (Ref 42). Extracting the peak signal from a frequency band 


corresponds to the norm on the space of contimious signals Cla,b]: 
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Another quantity of interest, the signal voltape inteprated over a 
given band is derived from the norm on the space of absolutely inte- 


grable functions Ly [a,b]: 
b 
| S(w) || = f | S(w) {dw (61) 
L, fab] a 


The sipynal energy in a specified band coincides with the norm on the 


space of square-integrable functions L [a,b]: 


b 
| S@) || afs {S¢wy{* dup’ cea) 
L, [a,b] a 


The final quantity of interest is the total variation (T.V.) within a 
given band, which is a part of the norm on the space of functions of 


bounded variation BV[a,b]: 


1dS (a) | (65) 


Althoueh the operations Listed above are to be performed on continuous 
< i t 


sipnals, the spectral signatures have been discretized, and thus discrete 
versions of the operations were performed, Also, the actual features 
are further normalized and averared as deseribed below. 

The first 34 features could be termed spectral ratio features, since 
they consist of the ratios of characteristics from difterent bands. The 
use of these specific features was wotivated by both the PAM model and an 
examination of the spectra of the actual data. Several mathematical 
quantities that must be calculated to evaluate these features are pre- 


sented here, One of the first quantities of interest is the peak signal 


in a given band: 


P(i)y= Max[S(n)] (O41) 


» 


Wey. 
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where S(n) represents the discrete short-time amplitude (voltage) 
spectrum. The peak signal in the side bands and the second harmonic 
band appeared to contain class discriminating information. Another at- 


tribute of importance is toval signal in each band: 


FG) = & St) (65) 
neB. 
i 
The total signal and the mean signal in the various sidebands were used 


as features. The width of each band is calculated according to 


WOiy= Max Gn)? = ‘Man (i) 4 (66) 
neB, nes 


The band energy is described by 


ea) =< & teaal (67) 


nek. 
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Finally, the total variation in each band is calculated by 


Vi) = ¥ [S(n+1) - S(n)| (68) 


ner 


Total variation in the upper and lower sideband was used as a feature to 
distinguish between the case when the energy is smeared throughout the 
Sidebands and the case when the sideband energy is concentrated in dis- 
tinct formants. If one of the quantities is to be computed over maltiple 
bands, it is represented, for example, as 

Tiytd) = 2 S(n) (69) 
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The actual spectral ratio features used are shown in Table IT. It may be 
noted that in each case the features have been normalized by skin line 


voltage or enerey to reduce signal strength variation due to taryet range 


Or a@Spect chanyes or different receiver attenuation setivings. 
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Six additional features are computed that could be termed shape 
factors, since they are functions of moments of the spectrum. These 
shape factors contain information about the gross, plobal structure 
of each spectrum. The motivation for using these features stems 
from the well known fact that any probability density function is comp- 
letely specified if all of its moments are known. Furthermore, the 
preatest amount of information is typicatty found in the Tower order 
moments. Four mathematical functions are defined that are used in the 
computation of these shape factors, ‘The first function is the frequency 


component number of the peak signal tn a band: 


RC) = n such that S(m)=P(1) (70) 
The mean fer a band is defined as 


MG) = [2 2 Sim) )/7TQ) (71) 


The variance for a band is 


y y 


ACiy) = [2 n° S)I/TO)-ING)) (72) 


and the skewness 15S 
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CG) = Ee w s@nl/T)-3NG) AG)SINGOT? (73) 


The shape factor features are the last six defined in fable I. The shape 
factors have all been normalized by the skin line frequency to remove 


shape differences due to changes in redial velocity. 


Feature Select ton 
After the designer has extracted a set of potentially useful fea 
is frequently Caced 


tures from the preprocessed signals of interest, he 
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} 
{ 


ln a A NT 
meee —- 


Table I. Features 


Symbol Name Definition 
FQ) Peak 2 P (2) /PX3) 
} F (2) Total 2 T(2) /P(3) 
{ F(3) Mean 2 F(2) /W(2) 
{ ? 
| F(4) Peak energy 2 hc Ge i 
1 
F(5) Total energy 2 E(2)/ [PC 3)]° 
| F(6) Mean energy ¥ (SY/W(2 2) 
: F(7) Total variation 2 V(2)/P (3) 
| F(S) Mean variation 2 F(7)/W(2) 
| F(9) Peak 4 PLA)/P(3) 
i F (10) Total 4 TCAY/P (CS) 
{ F(1id) Mean 4 F(10)/W(4) 
: FGk2) Peak energy 4 [PQ] 
" F135) Total energy 4 BC ALR Ca 
{ F(14) Mean energy 4 F(1S)/W(4) 
FOILS) Total variation 4 VeCa PCS) 
| (Lo) Mean variation 4 F (15) /WC2) 
F(17) Peak 2,4 Max [F(1) ,F(07)) 
F (FS) Total 2.4 F(2) ELD) 
{ (19) Mean 2,4 FCLS)/[W (2) 44) J 
F(20) Peak energy 2,4 Max [F (4) , 7 (12) J 
‘ FC2L) Total energy 2,4 | (S)4 RUS) 
; F(22) Mean energy 2,4 ECUY/EWO2) WA) | 
: Pees) Total. variation 2,4 EQy ar ‘an 2 
j F(24) Mean variation 2,4 PC23)/ (WE) AWA) ] 
{ F(25) Peak 5 PCS vive 5) 
; F (26) Peak energy 5 (i(25)]’ 
¥(27) Penk 2,4,5 Max {EF (17), 251] 
F( 28) total 2;4,5 (1S) 4° rs) / 3) 
F(29) Mean 2,4,5 F(28)/ [WC NUD W659] | 
| (350) Poak energy 2,4,5 Max] F(20), : (26) ] 
F(S1) Total enerey 2455 E(T)HECSI/(P(S)T- 
F(32) Mean energy 2,4,5 ECSO)/IWO2VIW OD) AW OS) ] 
ESS) Total variation 254,5 PCQS) AV (5) (P(A) 
| F(34) Mean variation 2 yA 55 BASS) / EW Ga ee >) | 
F(S5) Mean difference 2,5,4 IM(2,3,4)- a S)t/RGS) 
: ¥ (S60) Standard deviation 2,3,4 [A(2,35,4) : RC) 
F (37) Skewness 2,3,4 (0(2,3,4) 77 °7R (3) 
: F(S8) Mean difference 27,4 (M2, 1}. -R(S)T/ROS) 
i 
F(39) Standard deviation 2,4 [AC2,4)] os /R(S) 


FOO) Skewness 2,4 [€(, 24) ] 73) 


i.e. feature selection. Typically, more features are extracted than 
i can be conveniently used in the classification process. The primary 
limitation on the number of features that may be used results from the 


finite sample size of the training set. 


If an excessive number of features is used for a given set of 


observations, overtraining on the design set will occur. The relation- 
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Ee ee 


ship between the number of sample observations available and the optimal 
number of features to be used’ is not well understood, even thoush there 


is an extensive literature available on the subject. The overtraining 


sii seitelhiggis <2. 


will be manifest in one of two ways: either an excessively optimistic 
evaluation of the selected features' usefulness or an actua] de ine in 
performance over that achieved with a smaller number of features. 

Foley (Ref 23) has reported the results .of an experiment in which 


he arbitrarily separated samples arising from an artifically generated, 


Single, multivariate uniform density into two classes. By sclecting 


a large number n of features in relation to the number k of samples per 
class, he could obtain good separation of the data using the Fisher 
linear discriminant, even though all of the data was generated from the 
same probability density function. For a ratio of k/n=0.36, he obtained 
perfect separation. Foley concluded that, on the basis of his experi- 


“ 


ments, the ratio k/n should be greater than or equal to three to avoid this 
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type of anomaly. 
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The second type of manifestation of overtraining, and the ene that is 
frequently reported in experimental pattern recognition exercises, is an 
increase in classifier performance as a function of n out to a certain 
number followed by a decrease as more features are added. The number of 


i features at which peak performance occurs is referred to as the optimal 


computational complexity by Chandrasekaran (Ref $8). This phenomenon does e 
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not occur for Bayes classification with known densities. 
In order to avoid overtraining and to keep on-line feature extraction 


and classification costs at a minimum, the designer is faced with the 


a 


problem of how to select the best m features out of the n that were 
extracted from the design set data. It would be easy enoush to estimate 
the performance of each feature individually (given the form of the 
classifier to be used) and rank order the features based on their 
performance. However, if the n extracted features were so ordered, 

the m individually highest performing features would not necessarily 
comprise the set of m optimal features (Ref 14). Cover and Van Campen- 
hout (Ref 16) have shown, further, that there exist jointly Gaussian 
probability density funetions on which all possible probability of error 
orderings can occur among subsets of several measurements subject to a 
monotonicity constraint. They conclude that "no known nonexhaustive 
sequential m-measurement procedure is optimal." Exhaustive search is 


4 


generally out of the question because of the astronomical number of 


’ 
ay 


combinations that must be evaluated. If n features have been extrac 


and it is desired that m of these be selected for classificattan purposes, 


‘ 


then the number of combinations that must be evaluated is equal 


n\_ ne * 
oe m! (n-m)! (74) 


As Stearns (Ref 64) has noted, typically n is of the order 100 and m is 


12 


£ order apy / { 
of order 10, sain ht) 7 x 10 
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Fukunaga and Narendra (Ref 26) have advanced a branch and bound 


algorithm for feature subset selection that restricts the domain of the 


ie 


exhaustive search. The technique consists of the search of a tree whose 


nodes represent the sets of features included in Che evaluetion, The root 


of the tree represents the set of all features, and each of its n 
successors represent the exhaustive set with a different feature removed. 
Fach succeeding set of nodes represent sets reduced by one feature 

over their predecessor. The nodes with no successors represent the sets 
of m features. The savings in the top-down search procedure arise 
because no successors must be evaluated, if performance falls below 

an already established lower bound. As Fukunaga notes, the scheme 

only works if the performance criterion is monotonic on the number of 
features used. Performance improvement of a subset of features over 

the superset from which they are drawn cannot be allowed. Thus, this fea- 
ture selection technique will work well when performance measures 

that presuppose exact knowledge of the statistics of the processes, such 
as Battacharyya distance or divergence, are used. However, its optima- 
lity will not be retained for practical problems where probability of 
error is the performance criterion, and an optimal computational 
complexity exists. 

Since the only optimal feature selection technique for a practical 
pattern recognition problem is exhaustive search which is computationally 
exorbitant, heuristic search procedures are required. Mucciardi and 
Gose (Ref 46) have provided a comparison of seven heuristic feature 
Selection techniques. Stearns (Ref 64) has proposed a "plus m , 
take away n" search algorithm that is intuitively appealing. Others, e.g. 
Chang (Ref 9), have applied dynamic programming to the feature selection 
problem. Stepwise discriminant analysis, as described in the next section, 
was chosen as the feature selection technique to be used for the present 
problem, because it results in a simple classification rule. Also it 


is relatively robust as long as the data is unimodal by classes. 
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Stepwise Discriminant Analysis 


Stepwise discriminant analysis (SDA) is a technique of feature 


selection and evaluation. From a training set of extracted features, 


SDA will select the most useful (in a specified sense) features and 


evaluate the discriminating power of those features using linear dis- 


criminant functions. There is no reason to believe that SDA is an 


optimum procedure, in any analytical sense, for the problem at hand, 


nor is it optimum for almost any problem to which it is applied. 


Although SDA is generally suboptimal for selecting features, the optimum 


technique is too expensive in terms of computation to consider. The use 


of SDA has been widely reported in the literature with good results, 


for example, by Mohn (Ref 45). 


Besides performing feature selection, SDA effects classification 


using linear discriminant functions. Nilsson, in his classic book 


(Ref 49), details the theory of linear discriminant functions for pattern 


recognition. It is well known that linear discriminant functions are 


not optimal for classification except for certain types of class con- 


ditional probability density functions, including jointly Gaussian 


with equal covariance matrices (Ref 18:29). Use of linear discriminant 


functions may be justified on the basis that if acceptable performance 


such discriminants are useful, though suboptimal. 


results, then 


The goal of SDA is to find a linear discriminant function that 


maximizes the ratio of between-class scatter to within-class scatter. 


Since it is an iterative process, adding or deleting a feature on each 


pass, it is referred to as stepwise. The motivation for the procedure 


is based on the classical work of Fisher (Ref 21). 


The input to the SDA alporithm is a set of n-dimensional feature 


| vectors Xp oXyoee Ny assigned to two or more classes: ae The 
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technique is a supervised learning method since the class labels are known 
for each feature vector; however, it is nonparametric, Since the class 
conditional probability density functions are not known. The output of 
SDA is a set of linear discriminant functions that optimizes a separa- 
bility criterion. 

The data are real and three types of statistics are computed: 


1. The overall sarole mean vector 


SE ee x FUN ee Pec (76) 
NR 
where k. equals the number of samples assigned to w.. 


ik 


3. The wbiased estimates of the class variances for each feature 


dey Speege Eo telee S  PAG2 sc sn (77) 
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where i refers to class number and j to the feature number. 


At each step of the analysis, two types of overall scatter matrices 


tad 
Lou 


are computed. First, the individual class scatter matrices are evalua 
F Py a ee ee : 
j XeW. (X-X.) (X-X.) pd ed prace att (78) 
i i i : 
from which the (pooled) within-class scatter matrix is formed 


So = S; (79) 


The within-class scatter matrix, as its name implies, gives some measure 


of the total within-class scatter. The total scatter matrix 


= x EX. <HVCX. HR) (80) t 
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indicates the total scatter of all of the samples. For the scalar case, 
it may be noted that the within-class scatter is the sum of the normalized 
class variances, and the total scatter is the overall normalized variance. 


It may also be observed that 
m - - —_ — —s 
Soh Gla Oe s 


T a=) Xe. yu gaat iy al 


m 7 Mm = £ 
eR xR. TAZ. CE RHR. 
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m 


ee Ce | 
»} = © ree ee vv 
Syt sty k; CK -B 0%, -2 (81) 


The second term in Eq (81) is frequently termed the between-class scatter 
matrix Sp (Ref 18:119), and thus total scatter in the sum of the within- 


class scatter and the between-class scatter 


5. = 6, #76 (82) 


The method of selecting those features to be added or deleted at each 
step of the analysis is based on the ratio of within-class scatter to 
total scatter. This quantity, which is the ratio of the determinants of 
the within-class and total scatter matrices, is referred to as Wilks’ 
A-criterion (Ref 36:77) 


det S_ (Y) 
Ww 


A(Y) = det Sp (Y) (83) 


where Y is the vector of features that have been selected and det re- 
presents determinant. Wilks' A-criterion has values between zero and 
unity, with larger values indicating poor separation, while smaller ones 
indicate good separation. At each step, the features are divided into 
two disjoint groups, those that have been selected at some previous step 


and those that have not. A partial A-statistic is defined 
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ry - ACY, u)). 
NOGEY) = iW) (84) 
where u is a feature that has not been selected as yet. This quantity 


pives a measure of improvement of the expanded feature set over 


the original set. A corresponding F-statistic is computed 


pe Kemp | 1-A(u.¥) (85) 


m-1 i A(Qu.Y) 
where p is the number of features included in the analysis. The F- 
statistic is used to control the addition of new features or the 
deletion of already selected features. The F-statistic is called either 


the F-to-enter statistic for the entry of u into the set Ye ay sree 


2 e 


] 
or the F-to-remove statistic for the deletion of u from the set (Yp>¥y> 
Se AS 2 A new feature will not be added if its F-statistic is below 
a specified threshold, or an old one will not be deleted if its F- 
statistic is above a specified threshold. Both thresholds are input 
parameters in the computer prosram. 

At each step of the analysis, one feature is added or deleted accord- 
ing to the following three rules: 

1. Remove the feature with the smallest I'-to-remove value unless 
this value is greater than or equal to the F-to-remove threshold. 

2. If it is not possible to remove a feature, add the feature with 
the highest F-to-enter statistic that is greater than or equal to the F- 
to enter threshold. 

3. If it is not possible to delete or add a feature, the stepping 
procedure is complete. 


After the iterations have ceased, a linear discriminant function 


is computed for each class: 


| 
| 
{ 
| 


| 
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where X is an arbitary feature vector composed of the features selected 
by SDA, and x , is the sample mean vector for those features. The pooled 


within-class sample covariance matrix in Eq (86) is 
5 
W (87 


on 
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for the selected features. The linear discriminant funetions defined by 
Eq (86) are optimal for Gaussian class conditional densities with equal 
covariance matrices (Ref 18:29). 

For a more rigorous and complete presentation of SDA, as well as 
a flow chart, the reader is referred to Jenrich (Ref 36). The actual 


corputer program used is BMDO7M which is contained in the BMD Biomedical 


= 
~ 
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Computer Programs Series (Ref 17:% 
Resides the linear diseviminant functions, other outputs from 
BMDO7M include a classification matrix, and posterior probabilities and 


Mahalanobis distances for each sample and class. Also, a scatter plot 


of the data is made usine the two best variables produced by a Karhunen 


yt yy 


Loe ve expansion (Ref 25: 226-233) on the peoled covariance matrix. 
The primary disadvantage of using SDA for feature selection and 
estimation of the performance of the features thus selected ts that 
it only uses second order statistics. As long as the experimental 
data is unimodal for each class, this is not a serious drawback. Fre 
quentiv, estimates of higher order statistics are rather unreliable for 
experimental data in anv case. Application of SDA to a wide variety of 


feature selection problems has demonstrated it's robustness. Also, the 


} lineor diseri nt classifier is simple and cheap, in terms of memory and 


computation time, ¢o implement. 


Nearest Neighbor Classification Rule 

The concept of peometric nearness as a measure of similarity os 
implicit in all statistical pattern recopnition formilations, The nearest 
neighbor (NN) classification rule, First proposed by Fix and Hodges 
(Ref 22), elegantly captures this concept. The NN training set consists 
of a number of d-dimensional feature vectors {X,,X.,....X } that are speci 


mia” n 


i fied to belong to two or more classes, sa | ;Ongeeeg 0 . A SUitable metric 
2 m 
d(+,°) is defined on the space to which the feature vectors belong. A 
| test vector Y of unknown class label is assipned to the class w_ such 
] es 


that 


X.e€u where d(Y,X,) SD dy Xi) (88) 
: | =, 
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The veetor of unknown classification is assipned to the class to which 
its nearest neishbor belongs. 
The NN rute has been applied to verv diverse classification problens, 


probably because of its intuitive appeal, its ease of tuplementation, and 


| its robustness with respect to the underlying statistics of the data. The { 
NN classification rule will vield acceptable performance when other 
i classifiers fail, e.g. when applied to data having multimodal probability 


we 


density functions. Use of the NN rule vequires no knowledge of the under 


} lying statistics of the data, and thus it is termed a nonparametric tech 


nique. Because of its structure, the rule can provide clustering infor 
mation about the data, 


Cover and Wart (Ref 15) have shown that the NN rule asyvinptotic 


i 

t 

i probability of error, Po is bounded above by twice the Baves probability 

j of error, h*. If P(e) is the NN classification average probability of 
n 


error for a design set containing n samples, then the asymptotic error 


rate is 


1 hG 


Cover and Hart then demonstrate that 


Bes PC oie 1) (90) 
m-l 


where m is the number of classes. The lower bound on P is obviousty the 
Bayes error rate. This near optimality of the NN rule for large training 
sets, then, provides further justification for its wide use for classi 
fication and feature evaluation. 

The expected probability of error for the NN rule is easy to compute 
and is reliable (assuming the design set is representative) Since each 
training set sample may be considered individually as a test set of one 
observation. This method of estimating the expected probability of error 
is a generalization of the U method described by Toussaint (Ref 66), who 
compares its statistical properties to those of other methods. 

The procedure is to determine the number of samples in each class that 
have nearest iedghbors. cuuside of the class, divide by the number of 
samples in the class, multiply by the class prior probability, and sum 
over all the classes. To express this formally, it is first convenient 
to define a set membership function for the nearest neighbor of a 


given feature vector X jew 
c 


&(X.,X!) = , if X! do 9 
(X; x) 1, if x Ho (91) 


QO, if Xf cw 
4 ¢ 
where Xi is the nearest neighbor of Xo LG 


Kayxl) = mi Nain Xs 92 
G(X, Xj) ey d(X5 2X5) (92) 


Then the expected probability of error is 


P(e) = 2? a : é (X, aS ) (93) 
c=l no Xjew, we 
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where P(w) is the prior probability of class ¢ and no is the number of 
samples assigned to We te should be noted that the metric most fre 
quently used with the NN rule, and the one used in this work, is 
Euclidean distance: 


d(x,Y) = |] x-y|{ e* (1G yy? (x-yy) 14? (94) 


The greatest drawback to using a NN classifier for on-line pattern 
recognition is the large amount of computational resources required, 

Fach of the design set vectors must be stored, and all of the distances 
must be computed for each test vector. To alleviate this computational 
requirement somewhat, Hart (Ref 33) has introduced the condensed nearest 
neighbor (CNN) rule. The CNN rule uses an iterative procedure to find 
only those samples that are near the class boundaries. By retaining only 
those samples, the CNN classification procedure has the possibility of 
reducing the on-line computational requirements considerably, without 
effecting performance significantly. 

The k nearest neighbor (KNN) rule is an obvious extension of the NN 
classification rule. As the name implies, this rule simply assipns a 
test vector to the class which is most frequently represented amony its k 
nearest neighbors. A simple vote is taken to determine the class label 
for the test vector. The asymptotic upper bound for the KNN probability 
of error may not be expressed simply; however, the asymptotic performance 
is monotonic on k. Duda and Hart (Ref 18:105) provide a chart depicting 
asymptotic KNN performance as a function of Bayes error for various values 
of k. 

Results 


To test the efficacy of the extracted features, several classification 
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experiments were conducted using SDA and NN classification, In all of 
the experiments described in this section, unless otherwise indicated, 
the feature vectors consisted of the forty features, previously deseribed, 


extracted from a 512 point, positive, amplitude spectrum. Data Base A 


. 


was comprised of 620 samples of Tarpet 1, 296 of Target 2, and 471 of 


2 


Target 3. Duta Base B had an additional 13] samples of Target 1 and 5: 
of Tarpet 2 plus 83 samples of Target 4, 18 of Target 5 and 65 of Tarpet 
6. Table XV in Chapter V summarizes the various experiments conducted. 


In the first set of experiments, Data Base A was used with Targets 


2 and 3 united into a single class, since they are of similar type. Thus 
for this two class problem there were 620 samples in the first class and 
767 in the second. Experiment 7] consisted of the application of SNA to 


this data base with equal prier class probabilities assigned. Table I] 


Shows which features were selected or removed at each step, The | 


statistic, as defined im the section on SDA, controls the entrv oy re 
moval of features. The features are as defined tn Table IT in the section 


on feature extraction. As can be seen from this table, F(28), the total 

signal in the sidebands including the second harmonic, is the best single 

feature. Feature F(21) is the total energy in the lower and upper side 

bands, while F(23) is the total variation in those two bands. Features 

F(S6) and POSS) are both shape factors. Table tli shows the features 

included at Selected stops, ranked in order of P-statistic, T.e., the 

“best” feature is first followed by the remainder in descending order of 

usefulness, It 18 interesting to note that F(28) is a reliable feature j 
when few featuresare utilized, but if is of less use when a larger number 

is selected. The maximum number of features selected is thirty six vather 


than forty, because the SDA alyorithm rejected the remaining four as being 


too hivhly correlated with those already tneludedd. fhe average probability 
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Table Til. eatures Included for Selected Steps 


in Experiment 1 


STEP. NUMBER FEATURES INCLUDED, ORDERED ON F 
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of error as a function of the number of features selected is shown in 
Fig 24. The average probability of error decreases monotonically from 
.21) for a single feature to .061 for thirty six features. The mono 
tonic decrease in probability of error demonstrates that the compu 


tational complexity of this experiment is greater than thirty six 
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Fig 24. Probability of Error as a Function of the 


Number of VPeatures Used for Experiment 1 


features,Which might be expected because of the large size of the design 
set classes. This figure also shows the diminishing utility of each 
additional feature. The use of five features reduces the probability of 
error to less than ten percent. The addition of thirty one more features 
only lowers the error by a few percentage points. 

The thirty six feature SDA classification matrix for Experiment 1 


is shown in Table IV.) The error bias toward class 1 is caused by several 


phenomena. Qualitatively speaking, the difference between the Sample 


Table IV. Confusion Matrix for Experiment 1 


CLASS TE LED AS 


spectra of the two classes ts that those from class 1 tend to have more 

sideband modulation than those of class 2. When conditions cause class 

lL signatures to have less sideband modulation, they are misclassified. 
As previously noted, the Target ]} modulation spectra at sero and 180 
degrees arimuth angles appear different from those at other aspects. 
This difference is reflected by the fact that twenty seven out of fitts 
eight samples at those two cardinal aspect angles were misclassified. 
Another cighteen of the errors occurred with the target backing up. Of 
the remaining twenty misclassified samples, 4a large percentage occurred 
either early or late in the run, indicating the target may have been 
only marginally within the radar gate. Fine tuning the skin line 
amplitude threshold could possibly oliminate some of these orrors. 

The performance cited in this experiment and in su 
periments could be criticized as being optimistic due to testing on the 
training set. However, because of the Large siste of the training set 
classes, performance would not be significantly different if a small 
number of samples were withheld for testing purposes. Furthermore, the 

| nearest neighbor classification experiments did not test on the training 
set, and the performance was not much different trom the SDA results. 

A seatter plot for Experiment lots depicted in Fig 25. The data is 
projected onto the best two coordinates arising from the SDA canonical 


analysis. The asterisks indicate the class means, and the dollar signs t 
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represent two-class overlap. This plot indicates that 


are both essentially unimodal, 


In a tarpet identification plication, a pattern recognition system 


be signed to withhold 3 lec until several observations are 
sequential analysis was performed on the results 
the nonstationary nature of 
h sample, and then a simple maj 


Vote was taken letermin wal decisik The resul 


monotonically fron - on bservati t ‘or seven for 


sification decision was based on s vations, the 


ples misclassified, and all the t : class 1} were 
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training set. Almost ten percent of the samples were at these two 
vehicle aspects. The runs with aspects nearest these were 22.5 degrees 
off, and no problem was encountered with those runs. If the exceptional 
behavior is assumed to extend for only five degrees either side of the 
cardinal headings, and all aspects are assumed equally likely, then 

the cardinal heading runs should only account for about five percent of 
the total samples. Thus the error rate would only be about half the 
indicated one. 

Experiment 2 was the same as the first except that the Target 1 
samples at zero and 180 degrees were included in a separate class. The 
SDA iterations stopped again after thirty six features had been 
selected with an overall probability of error of .053 for a single 
observation. The confusion matrix is shown in Table V, where A repre- 


sents the Target 1 samples at zero and 180 degrees. The probability 


Table V. Confusion Matrix for Experiment 2 


CLASSIFIED 


A 


32 


of error given above did not include class 1/class A errors. It is 
interesting to note that less than one percent of the class 2 targets 
were misclassified. Also, the vast majority of class A targets that were 
missidentified were called class 2. The scatterplot for this experiment, 
shown in Fig 27, provides more insight into the structure of the data. 
Experiment 3 consisted of applying the NN classification rule to 


the data as described in Experiment 1. The average probability of error 


96 


hs 
] rf a ie AU xy ae 
a 
_ 
al a = 
4 > ~ eee = - = 
+ Ps ’ © «9 ? > ? 
B) 
~ ~~ ¥ ~~ - 
s ve oF ’ ’ ’ 
¢ Se Senet 
7) ~ > + ig ~ aaa 
oO ' ’ . . oe , , . . 
ae { x ~ —) y = ss * © x 
e « , , . . . , , ° 
aH“ 5 —$—$ =-.-—~— se Sd al a ldap ae 
e ok ° boo °F > ae te ° 7 eae rf 
eo > ~ - -- ~ ~ -- - - - ~ 
a ~ ° - 7 ’ a iy ? ° , y  CCOCCCLS 
= Se eo ee + een lnmneees 
~ 2 gee: veers vere ? ’ ~ oe aC 
3 ~ - ae .oe4 + Ms +H Se oe wey 5 
md i . . et . . . . . . ‘- and . . . c 
SS ee Sina eee een a ican el nd ce en ra a ar 
’ > ~ SSeS S we yen nw & - wy aet, ~ = 
Sie! . . Sk hie we orev eres ’ , ove cc . cc 
=a & ~~ ¥ =e > Miwa eS lee eS oy ~ oe — —_ 
“5 , , Ff ’ . eS AR ESS Ph om FF . , ee | . 7% 
or < ee SS ee - aii = 
n 5 = Bag eg gr ett pie 90 ag aS No eee ae 5 : =) 
me &, , wR kr HW Bere > vee 7 wee ve c SCL Ce 
= al . oe . . . re oF » . . * . 
= a Say e 
ag , oe ’ , . , . ° Cis 4 S ci 
fy 2 - ae aay he Bie = = 
| > b Fe WD ote WOR oy cee o% GES 
eR a a oe = et ee 
Asa ad ee a geen eal et pe aeias Ss x a > 
—_ < of . . ’ A . . , oe . . , . - | ie Oe ee ee ee ee 
pe - ~ ~ - -- id -—+~-* “_- - 
ay | % EG be OF aes Lz Z 
- i a . , o- GG exst¢ececcrececeszade 
ee Ss - ” aes = - ~-* — > - - 
’ 7 . . ° io 6% cL ‘s 
we ee me . ns “5 
of or ee . ’ . ¢% s 
sl iean—ictaemn ie onions shan fa 
j 3 ~¥ Dae ine = wf Teas kk 5 
> ve ce FS ve % 2° Kew ¥ v oS 
~ ~ ~ a a oe Ry ~ > - 
y ? PCkre oe ae 7 ? 
. a oe Se See , 
j ah apie vy “y ? we 
’ et A ’ er PF a ad . . 
Yi ane ATL RS. Renee ih ota a 
if * ~- Sd ~~ - ~~ - 
{ ’ or or ore , a c . 
° oe , ° 3 ’ 
$e a eee a = a 
a 4 ’ ’ - % & & > 
a fox La a ee ere Cerne ne ics Herron rere 
oe be > » 
: ° 
a ene cpa a a Rg ape att panne ps ital Sic 
! , , . . . 
° 4 7 7 
sealant isan ipsa pig hc en cc a a a Ae a 
’ ’ : vi 
| 
ee a 
, G 
(pare ' sc nai ti a i ests na 
= a 
: 7 
? 


EPRI OTE BR? ET TERT TN eee 


eee 


tad nolo eats 


parce ghee Ange cath no 


vs the munber of observations is shown in Fig 26. The probability of 
error decreased from ,097 for one Look to .029 for seven, The NN per 
formance was inferior to that of SDA for one and three looks but was about 
the same for five and seven, The confusion matrix is presented in 

Table VI with the first number representing one-look performance, and 


the second, seven-look. Again, the Tarpet 1] runs at zero and 180 degrees 


Table VI. Experiment 3 Confusion Matrix 


CLASS TE TED AS 

] 2 

TRUE 1 29/588 91/32 
CLASS 5. 36/5 741/762 


proved troublesome, causing twenty four percent of the single look errors, 
The runs where Target lo was in reverse caused a significant portion of 
the remaining errors, twenty nine percent of the single Took and thirty 
one percent of the seven look errors. 

Experiment 4 was identical to Experiment 3 except a 3NN rule was used 
for classification, The performance of the S3NN rule was not significantly 


different from that of the single NN rule, as demonstrated in Fig 26. The 


SNN rule did, however, require twenty cight percent more computer resources 


to run. 

Experiment 5 was an application of SDA to the same data base used in 
Experiment 1 except that only the upper sideband and second harmonic fea- 
tures were used. This included features F(9) through FQ1O) and F(25) and 
F(26). The motivation for this experiment was to determine how the 
performance might be deyraded by distributed clutter in the lower sideband 


that could be experienced by an airborne radar. If the signal to clutter 
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ratio was poor, the lower sideband spectral ratio features and the shape 


factors would be useless. ‘The performance for this experiment increased 


from a probability of error of .128 for a single observation to .067 for 
] 


seven. 


Experiment 6 was an application of SDA to Data Base A with each of 


the three targets considered as a separate class. 


Thus 


class 2 had 620 


Samples; class 2 had 296; and class 3 had 471. Each class was considered 


to have equal prior probability. Table VIT shows a 


summary of the SDA 


iterations. The SDA performance for this three class problem is shown in 


Fig 28. The relatively poor performince in this experiment is due to 
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Fig 28. Three Class Performance, Experinent 


the similarity between the modulation spectra of Tarpets 


flected in the confusion matrix, Tabte Vill, for a singte observation. If 
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class 2/elass 3 errors are not counted, the performance 


as for the previously described two class problem, 


Experiment 6 18 shown in Pig 29, Table IX reflects 


The 


the 


0.06- LIN DIS 


7, and § 


and 3 as re 


is about the same 


scatter plot for 


features, ranked 


{ 
\ 


, 
' Table VII. Summary of SDA Iterations for Experiment 6 
STEP FEATURE EF VALUE NUMBER OF PROBABLETY 
NUMBER ENTERED TO ENTER VARLABLES INCLUDED OF ERROR 
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Table VIII. Experiment 6 Confusion Matrix 


CLASSIFIED AS j 


J 
1 Z 5 
TRUE I 545 21 54 
CLASS 2 2 202 92 
OS ns hs eee pe ae 8 95 3608 
In descending order of F-statistic, selected by SDA at various steps. 
Feature F(36), the "standard deviation" of the lower and upper side- 
bands plus the skinline band, was consistently the top feature for 


this experiment. 
Experiments 7 and & consisted of, respectively, single and three 
NN classification applied to the same data base used in Experiment ¢. 
¢ 


The results are summarized in Fip 28 along with those from Experiment 


6. In general the NN results were slightiy better than SDA on this 


: 
| 
| 
data base 
Table IX. Features Selected at Various Steps for Experiment 6 
STEP NUMBER FEATURES INCLUDED, ORDERED ON IF | 
| 
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$8,4515,1,12,30,40,1 
| 
30 $6,359, 22,37, 35,9,28,38,4,31,14,12, 


6,40 8,1, 25,20, 53,17; 21.37, 18, 
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Fip 29. Seatterplot for Experiment 6 
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In the remaining series of experiments the training set consisted 


of Data Bose A and Data Base BL This expanded data base consisted of 
75) samples of Target 1, 349 of Target 2, 471 of Target 3, S3 of Target 
4, 18 of Target 5, and 65 of Target 6. Targets 1 and 4 are of similar 
tvpe with respect to the structures that cause the modulation, while 
the remaining tour are somewhat similiar, 

Experinent 9 consisted of SDA of a two class problem with class 1 


9) 


comprised of Targets 1 and 4 and class containing the samples from 


the remaining four ta A summary of the features selected and the 


probability of error at certain steps is given in Table X\. The table 
reveals that feature F(36) was once again the best single feature. Table 
XE shows the features included at selected steps ordered on the F 
Statistic. A scatter plot fer experiment 9 is depicted in Pig 30. The 


final confusion matrix for this experitoent using 34 variables ts shown 


In Experiment 00 the Six tarpets Were each considered as Separate 
classes with prior probabilities equal to the ratio of class samples 
to total samples. the features selected by SDA are shown in Table XUIT. 
The confesion matrix for a single observation and thirty five features is 


shown in Table XIV. Ut is apparent from the confusion matrix that th 


classifier performance on Target G is very poor. AS with all Data Base } 
targets, signatures from this target were only available at fourty five 
degree increments. At zero and 180 degrees, the Tarret © spectrum had 
extremely Jow levels of sideband modulation while at the aspects fourty 
five dewrees of f the cardinal headings, the sideband modulation levels 
were moderately high. Thus the features for Target 6G arise from a 


himodal distribution with the cardinal headine sienature being over 


| 
{ 
Table X. A Summary of SDA Tterations for Experiment 9 
STEP FEATURI F VALUE ‘TO NUMBER OF PROBABLLETY 
NUMBER ENTERED REMOVED ENTER OR REMOVE FEATURES INCLUDED OF ERROR 
l SG 594.16 ! 0.259 
2 2 129.0 2 
5 3 PES .2% 5 
{ sa) 73.18 { 
S 8 4.74 4 0.162 
6 Lo 8.45 o 
4 28.20 7 
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Table XI, Features Selected at Various Steps for 


Experiment 9 
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represented. The probability of error for Experiment 10 js plotted vs 
munber of observations in Fig 31, along with the results considering 

Target 2 and 3 belonging to the same class, The six class probability 
of error for seven looks was .164. The Experiment 10 scatter plot is 


shown in Fig 32. 


ue 


Table XIV. Experiment 10 Confusion Matrix 


CLASST FLED AS 


\ 690 3] So S38 18 18 
TRU} 2 11 228 10° 2 1 0 
CLASS 5 Ls 79 w78 0 0 1 


© 12 19 k2 ] 0 2k | 


In the final experiment, 1 lL, the data base as in 
Experiment 10 was used In this exp if e structure was imposed 
on the classifier as sugsested by Meisel (Rei iS and SS)... FEret., 
the data was classified Into the two major groups as in Experiment 9 and 
then final classification was made. Since no significant difference in 


performance over Experiment 9 resulted, the results will not be detailed. 
The seatter plots of the two majer pronps are of interest and are shown 


in Fig 33 and Fig 34. The similiarity between targets 2 ond 3 is manifest 


in Fig 34, 
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V Conclusion ; 

ii peta, ' 

This concluding chapter presents a summary of the work performed, 


conclusions to be drawn, and recommendations for further related research. 
Summary 

This dissertation has presented a method of performing automatic 
recopnition for the class of radar targets that possess periodically 
amplitude modulated stpnatures. The field of automatic target identi- 
fication was surveyed, with particular attention given to radar. A 
philosophical discussion of pattern recognition followed by a proce- 
dure for designing pattern recognition systems was presented. The nature 
of the periodically amplitude modulated phenomenon was investtyated, and 
the date bases used for experimentation were described. The form of the 
optimal classifier for this type of problem was examined, but it was 
concluded that such a classifier was not realizable due to inadequacies 
in the physical model and Inck of adequate statistical information, 
Thus, a suboptinal frequency: domain classification aleortthm was ¢ 


signed. Features that were relatively invariant With respect to target 


parameters were heuristically extracted from the short-time amplitude 
spectra of target signatures. Performance was presented for both linear 
discriminant and nearest neighbor classifiers. 

Cone bas ton ; 

The periodically amplitude modulated signature model of vehicular 
radar targets introduced here appears to qualitatively account for the 
observed doppler spread. The model fits best at asimuth aspects of sero 
and 180 decrees, where the specular scattering from the modulat ing: 
structure is preatest. The modulation is essentially frequency band 
limited to the band between the carrier frequency and twice the tarpet 


doppler Crequenev. . The implteattion is that information only out to the 


Lie 


—— 
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second harmonic of the skin line need be retained for classification 
purposes, 

tased on both physical considerations and empirical observations, 
the obvious choice for the pattern space is the space of short-time 
Fourier amplitude spectra. The effeets of the amplitude modulation are 
much more manifest in the frequency domain than in the time domain. 
Also, since the sipnal is essentially band-limited, it tends not to be 
time-limited. With the advent of FFT precessors on an IC chip, 
the required preprocessing can be done quickly and cheaply. For the 
type of features used in this work, the 1024 point FFT with Hanning 
window provided sufficient resolution and stability. 

Probabilities of error of under ten percent were achieved on five 
and fewer class problems by taking more than one observations. Table 
XV summarizes the classification results obtained for selected experi 
ments. The most reliable overall feature is F(S6) which is a measure of 
the spread of the spectrum about the skin line. Except for Experiment 5, 
typically about 35 features were used for estimating performance in 
Table XV, but performance should not be significently affected by using 
only ten. Since there is no significant difference in performance between 
NN and SDA classifiers, the data for each class is essentially unimodal 


in the feature space. The very slight edge in performance of the SNN over 


the NN classifier does not justify the increased computational expense. 
Although the cepstrum was not judged useful at all aspect angles, it could 
he used at zero and 180 depree aspect angles when the spectral features 
are less reliable. In all experiments, performence improved considerably 
as inereased mumber of observations were considered. The decrease in 


probability of error tended to flatten out at about seven looks. 
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| 
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} Table XV. Summary of Classification Results 
+ 
4 
; Data No. Rest Mean P(e) 
’ Exp. Base C’> ses Classifier Feature Best 10 Features 1 Look 7 Looks 
1 A 2 SDA 28 $6,25,38,26,21,29, .061 026 
' 28 ,34,37,23 
} 3 A 2 NN .097 029 
; 
| 4 A 2 3NN 089 021 
| ae 2 SDA 10 128.067 
6 A 3 SDA 36 365.37 505:4,.09 95.05 219 -L50 
y 12,14,22 
7 A 3 NN 208 126 
8 A 3 3NN . 203 133 
i 9 A&B 2 SDA $6 «36, 95,5, 89,,57,26,2; «128 062 
if T5850 
| 10) AGB S SDA 2153 077 
q 10 A&B 6 SDA 15 BOVoGSPeOyOrykeyeoy «205 .164 
| 2s ld U8 
7 
isospin hein gree ata 2B ARO Oe ese A ST 
/ 
t Experiment 5 indicated that the techniques outlined here could possibly 
j I \ | 
I be extended to airborne radar with some success. 
2 
: The results of this research indicate that it would be feasible 
| to identify PAM targets by major type using the techniques applied. 
| Classification of many types of targets into specific classes would re- 
j 
. quire the use of more of the fine structure of the spectral signatures. 
: 
| 


Such an effort would require a more careful study of the sources of the 
periodic modulation and the noise sources. 


Recommendations for Further Research 


No claims of optimality are made for the techniques proposed in 


Sea 


cb tat 


ais rate, 


oa 


this thesis. Because of the nonstationarity of the random processes 
of interest and the lack of a complete physical model, an optimal 
solution does not appear likely in the near future. Since heuristic 
procedures are called for, there may well be approaches to the problem 
that will yield better results, It is obvious that further physical 
investigations into the nature of the amplitude modulation and the 
sources of noise would produce a more complete model. 

An extension of the techniques proposed here, would be a sensit ivity 
analysis to determine the optimal order of FFT to be used. The 1024 
point FFT used in this work yields a theoretical frequency resolution 
of about two Hz and an actual resolution nearer four Hz when the finite 
window effect is considered. A lower order FIT would sacrifice some 
resolution but would produce a smoother spectrum. Also, the si 


the FFT integration period, the more nearly stetionary the process would 


be during the period, resulting in less smearing of the short-time 
spectrum. Two other side benefits of a lower order FFT would be reduced 
computational requirement for preprocessing and feature extraction, and 


= 
-+ 


Tf an experiment was conducted to calculate per 
formance in terns of everape probability of error as a function of FPT 
order, performance misht improve as FFT order was reduced from 1024 

to a certain order, and then it would begin to deteriorate as the fre 
quency resolution becomes too poor, 

Another experiment that would be of interest would be to estimate 
the short-time spectra of the processes using the autoregressive spectral 
estimator rather than the FET. Kaveh (Ref 40) has used this technique 
for high resolution velocity estimation of radar targets in the presence 


of extended clutter. Autoregressive (lineav predictive) spectral 
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| 
| 


estimation is also widely used in speech analysis (Ref 43). The 
periodically amplitude modulated radar waveform bears-a striking similarity 
to the speech sipnal. The order of the autoregressive estimator to be 
used could be determined by considering the maximum number of spikes 
that would be expected in the spectrum. The resulting, spectrum would 
be a smoothed version, and no computational effort would be wasted, as 
for the FFT, in computing the amplitude values of frequency components 
that are not of interest. 

The final recommendation for further research is to determine how 
Significantly the identificetion performance would be degraded by dis- 
tributed ground clutter as encountered in an airborne radar. This 


could be done by either gathering airborne data or by developing a suit 


able ground clutter model that could be added, to the existing data. 


model for distributed clutter that might be of use has been proposed by 
Rinesl C S6). If the identification procedure can be extended to 
the airborne problem, further work would be required to 

ability of crror to aspect angle. Such a study mipht suggest optinun 


flight paths for aircraft engaged in an identification task. 
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An algorithm that may be used for the classification of periodically ampli 


tude modulated (PAM) tareets is presented. The data base used to test the algo 
rithm is derived from raduyr returns from vehicles moving at various velocities 
tnd aspect angles, but the techitques are applicable, as well, to other active 
wave deviees such as sonar and laser. fhe received radar sienal is considered 
to be a Lime series Unit 1s a fietion of tareet type, range, velocity, arte : 
tats » tnt noise, Classification im implemented am the Prqavwency domain; shor e- 
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(20) time spectra are computed using the Fast Fourier Transform. (FFT). Fea 
tures are extracted from the information-bearing, sidebands of the resulting 
spectra. The radar signatures are classified using both Llincar discriminant 


and nearest neighbor classifiers, and performance is presented for two, three, 


five and six class eases using single and sequential looks. Probabilities o 
error of less than ten percent are achieved for five or fewer classes. 
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